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THE PERCEPTUAL FACTOR 


L. L. THURSTONE 
The University of Chicago 


A further study of the perceptual factor, previously isolated 
in a factor analysis of a battery of fifty-six tests, is made in a man- 
ner designed also to determine whether the same seven primaries 
would be found in a different population of subjects and with an- 
other battery of tests. The tests are described, and the results of the 
analysis are given in detail. Much attention is given to the matter 
of the orthogonality of primary factors and to their psychological 
meaningfulness. 


In a factorial study of fifty-six psychological tests* there were 
isolated seven primary factors whose interpretation seemed quite 
clear. The psychological interpretation of several other factors was 
not immediately evident. The clearest factors were the verbal factor 
V, the number factor N, the space factor S, and the memory factor M. 
The factors which were less clearly defined were the perceptual factor 
P, the word factor W, and the inductive factor J. The present 
study was undertaken in order to learn more about the nature of the 
perceptual factor P. This factor had appreciable saturations in the 
following tests: Verbal Classification, Word Grouping, Disarranged 
Sentences, Identical Forms, and Picture Recall. The saturations indi- 
cated that from one-fifth to one-third of the total variance of these 
tests was attributable to the factor P. Several other tests with factor 
loadings of .40 could be used in studying the psychological nature of 
the factor. The highest saturation was found in Identical Forms, 
since one-third of its variance was attributable to the perceptual fac- 
tor. 

A study of these tests indicated that the perceptual factor might 
consist in a facility to perceive detail even when it is buried among 
perceptual distractors. It might involve speed as an essential charac- 
teristic, but this impression may be due to the fact that the perceptual 
tests were simpler than the tests which were heavily saturated with 
other factors. The interpretation of primary factors is made largely 
in terms of the kind of thinking that is involved in doing the tasks. 

The characteristic that seemed to be common to all of the tests 
that were heavily saturated with the perceptual factor P was the 


*Thurstone, L. L., Primary Mental Abilities, to be published by The Univer- 
oo A Chicago as the first number of the Psychometric Monograph Series, April, 
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readiness to discover and to identify perceptual detail. The present 
experiment was planned to investigate this hypothesis. If a perceptual 
factor of this general nature exists, then it should be possible to pre- 
dict that certain new tests should be heavily saturated with this factor 
even though they might be otherwise disparate in superficial appear- 
ance. 
A test battery was assembled which included two or three tests 
for each of seven primary abilities that were isolated in the previous 
study, and also nine new tests that were planned to be tests of the 
perceptual factor with some variation in immediate content. The to- 
tal battery so selected was given to a new group of subjects for a new 
factorial analysis. The first question was then to ascertain whether 
the old seven primary factors would again make their appearance as 
they should, since each of them was here represented by two or three 
tests. Then there was the further question whether the new tests 
that were designed so as to be saturated with the perceptual factor 
would appear together with the previous best test for this factor, 
namely, Identical Forms. If they did not hang together, then their 
saturations with the other factors should enable us to identify the 
nature of the new tests, and we should be forced to guess again about 
the psychological nature of the factor P. 

The new test battery was assembled partly from the previous 
battery of fifty-six tests. These tests were as follows:* Addition (31), 
Multiplication (33), and Division (34) for the Number Factor N; An- 
agrams (15) and Disarranged Words (12) for the Word Factor W; 
Areas (29) and Tabular Completion (35) for the Induction Factor I; 
Opposites (10), Completion (11), Verbal Analogies (41) and Word 
Grouping (7) for the Verbal Factor V; Flags (20) and Pursuit (27) 
for the Space Factor S; Word-Number (46) and Initials (47) for the 
Memory Factor M; Identical Forms (26) and nine new tests for the 
Perceptual Factor P; Arithmetical Reasoning (39) and Reasoning 
(40) for the tentative factor R. In the previous study the Word 
Grouping test had high saturation in both the perceptual and the ver- 
bal factors P and V. 

The new tests that were specially designed for this experiment 
will be described here briefly. 

Scattered X’s. This test has one page of instructions and fore- 
exercise followed by ten letter-sized pages of pied letters. The sub- 
ject is asked to ring every letter x. Each page has twenty rows with 
thirty pied characters in each row. 

Letter A. The test has one page of instructions and fore-exercise 


*The numbering of the tests corresponds to that of the previous battery of 
fifty-six tests. 














gS Ne 














L. L. THURSTONE 3 


followed by ten letter-sized pages. Each page has five columns with 
forty words in each column. The subject is asked to check every word 
that contains the letter a. The test contains fifty columns of words. 
This test and the previous test are essentially cancellation tests. 

Identical Names. One page of instructions and fore-exercise fol- 
lowed by a test of ten letter-sized pages. Each page has four columns 
of names with initials. At the top of each column is a name with 
initials. The subject is asked to find the top name repeated somewhere 
in the column and to check it. The test contains forty columns of 
names. The names are not arranged in alphabetical order. 

Identical Numbers. One page of instructions and fore-exercise 
and a test of ten letter-sized pages. Each page has eight columns of 
forty three-place numbers. Each column is headed by a number. The 
subject is asked to find the first number repeated somewhere in the 
column and to ring it. The test contains eighty such columns. 

Highest number. One page of instructions and fore-exercise and 
a test of ten pages. Each page contains eight columns of forty three- 
place numbers. The subject is asked to ring the highest number in 
each column. The test contains eighty columns of numbers. 

Verbal Enumeration. One page of instructions and fore-exercise 
and a test of ten letter-sized pages. Each page contains five columns 
of forty words. At the top of each column is the name of a category 
of things such as flowers, clothing, furniture, trees, grains, vehicles, 
spices, coins, furs, diseases, beverages, and so on. The subject is asked 
to check four words in each column that belong to the category indi- 
cated at the top of the column. 

Concrete Association. One page of instructions and fore-exercise 
and a test of ten letter-sized pages. Each page contains five columns 
with forty words in each column. The subject checks four words in 
each column that are closely associated with the heading of the column. 
Examples of categories for the columns are politics, garage, estate, 
student, farm, bank, radio, hospital, government, river, business, law- 
suit, lake, winter, and so on. In each column there are four words 
closely associated with the category for the column. For example, the 
column headed river contains the four words canoe, rapids, levee, and 
current. The subject checks four associations in each column of forty 
words. The test contains fifty such columns. 

Abstract Classification. One page of instructions and fore-exer- 
cise and a test of ten letter-sized pages. Each page contains five col- 
umns with forty words in each column. Each column is headed by a 
word that designates a category. Examples of these are up, front, 
lightness, within, again, angular, narrow, etc. The subject is asked to 
check four words in each column that belong in the category for the 
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column. For example, in the column headed angular the four response 
words are corner, jagged, notch, and gable. This test and the two 
previous tests are of the same character, but they vary in degree of 
abstraction. 

Designs. One page of instructions and fore-exercise and a test of 
eleven pages of designs. Ten of these designs are shown in Figure 1. 
Each page contains ten rows with ten designs in each row. The sub- 
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ject checks every design which contains the capital letter sigma, >. 
He is shown the letter , which is called the “model,” and he is asked 
to check every design which contains the model. 

The object of the test was to determine whether the ability to 
extract a part of a design which is perceived as a whole is character- 
istic of the perceptual factor. The perception of the model within the 
design requires an act of abstraction which might, or might not, be 
involved in the perceptual factor. In the Scattered X’s test the subject 
looks for the letter x as in a cancellation test. But in that test the 
figure that he is looking for is presented as a whole. In the Designs 
test the figure that he looks for is presented as a part of a larger total 
figure. The task is tedious for most subjects, since the “model” must 
be extracted, or abstracted, as it were, from each design. 

Before giving the new test battery we made some estimates of 
the relative degree of saturation of the perceptual factor which could 
be expected according to our tentative formulation of the nature of the 
perceptual factor. It seemed that Scattered X’s should have a high sa- 
turation with the perceptual factor. Verbal Enumeration and Con- 
crete Association should have a higher saturation than Abstract Clas- 
sification because the latter test involves clearly other intellectual fac- 
tors besides speed of perception. The Identical Numbers and Identical 
Names should have high saturation and be comparable as to the per- 
ceptual loadings. The Letter A requires of the subject the abstraction 
of the letter a from meaningful words, and the Designs has the same 
characteristic, namely, that the “model” which the subject is looking 
for is imbedded in a design that is perceived as a unit. In both cases 
the subject must extract the object of his search from the larger unit 
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that he is perceiving, namely, the meaningful word in one case and 
the complete design in the other case. These two tests were question- 
able in relation to the nature of the perceptual factor. The Identical 
Forms should have an appreciable saturation in the perceptual factor 
since it had the highest saturation with the factor P in the previous 
experiment in which the factor was first tentatively. recognized. 

The new battery of twenty-seven time limit tests was given to a 
class of seniors at the Lane Technical High School in Chicago in the 
spring of 1936. The tests were given in five sessions to each group of 
about forty students. There were 215 subjects who completed the 
whole battery, and their test records were used in the correlational 
analysis. Table 1 is a summary of the distributions of the twenty- 
seven tests in the present battery. The table shows the name of each 
test, its scoring formulae, the arithmetic mean, the upper and lower 
quartiles, the standard deviation, and the estimated reliability of each 
test by the Spearman-Brown correction formula. The tests were fair- 
ly reliable except for two tests, namely, Arithmetical Reasoning, 
which had twenty problems, and Reasoning, which had twenty syllo- 
gisms. Some of the distributions were skewed. 

The scores were arranged to be less than 100 so that they could 
be tabulated in two columns of a Hollerith card for each test. Negative 
scores were avoided by adding an arbitrary constant as shown in the 
séoring formulae of Table 1. The cross products were obtained by a 
Hollerith multiplier, and the Pearson product-moment coefficients 
were determined from these products. The inter-test correlations are 
shown in Table 2. All the new tests for this battery have code numbers 
above 60. The tests from the previous battery of 56 tests have code 
numbers below 60. 

The correlation table was factored by the centroid method,* and 
the resulting centroid matrix is shown in Table 3. The mean of the 
residuals was .00108, and the range was from +.07 to —.07. The 
standard deviation of the distribution of residuals was .0243. 

The higher communalities are between .70 and .80. The commun- 
ality represents the variance attributable to the common factors. The 
lowest communalities are about .40. The tests with low communalities 
are not satisfactory and need considerable improvement. Several of 
the new tests were scored not only for the number of right responses 
within the time limit, but also for the ratio of the correct responses to 
the total number of attempts. There are four such tests. In one of 
these, No. 74, the primary factors account for only 28 per cent of the 
total variance of the test. 


*Thurstone, L. L., The Vectors of Mind, The University of Chicago Press, 
1985, Chapter 3. : 
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Improved rotational methods were used in determining the simple 
configuration of the present test battery. Each of the coordinate 
planes was determined independently. The new rotational methods 
will be described in a separate publication. They give the same final 
result as the older methods, but the new methods are more economi- 
cal of time. The rotated configuration is represented in Table 4. 

Table 5 shows the matrix of the transformation A from the cen- 
troid matrix F, in Table 3 to the rotated matrix F, of Table 4. This 
relation can be stated in the matrix equation 


F.A= F;. (1) 


The cosines of the angular separations of the unit reference vectors 
are their scalar products. These may be written in the form 


AA=N, (2) 


where N is a symmetric matrix of cosines or correlations between the 
reference vectors. 

The direction cosines of the primary vectors are proportional to 
the rows of the inverse A-'.* Hence the direction cosines may be 
expressed by DA-! = M, where D is a premultiplying diagonal matrix. 
The entries in D are so chosen as to normalize the rows of the product 
matrix M. 

The correlations between the primary traits in the experimental 
population are the cosines of the angular separations of the unit pri- 
mary vectors. These correlations or scalar products are given by the 


equation 


MM' = R,, (3) 
where R, is the matrix of the correlations. This can be written 
(DA) (Da-)' = R,, (4) 
or 
Da“ A’"D = R,. (5) 
By (2) 
A+ a*-t == ft, (6) 
and hence 
DN=D = R,. (7) 


In order to obtain the correlations RF, between the primary traits, one 
computes the inverse of the symmetric matrix A’A. The diagonal ma- 
trix D is then written so as to reduce the diagonals of N-' to unity. The 
matrix R, is shown in Table 6. 


*An economical method of computing the inverse has been devised by Mr. 
Ledyard Tucker, which he will describe in a forthcoming paper. 
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Table 4 is of interest in the present experiment. The coordinate 
planes have been so rotated as to maximize the number of nearly van- 
ishing projections. There are no significant negative projections. It 
is of major interest to ascertain whether the same factors that were 
isolated as primary in the previous experiment can be identified in the 
present experiments. The test battery was considerably altered, and 
the tests were given to a new population for a new factorial analysis. 

The order of the columns is of no significance. They are given 
here in the order in which they happened to appear in the computa- 
tions. Each plane was set in accordance with the configuration re- 
vealed in the plots of pairs of columns. A comparison will be made 
between the factors previously determined and the factors in the pres- 
ent battery. Some of the factors are clearly the same, while several 
new factors appeared in the present experiment. The residual plane in 
Table 4 is not identified. 

Inspection of the first column V leaves little doubt that the first 
factor is verbal in character. The highest saturations are in Abstract 
Classification, Completion, Opposites, Verbal Analogies, Verbal Enum- 
eration, and Word Grouping. These tests characterize the verbal fac- 
tor V. 

In the inspection and comparison of factor loadings of a test it 
must be recalled that the square of the saturation is the variance at- 
tributable to the factor. Loadings below .20 or .30 are unstable since 
they represent less than ten per cent of the variance of the test. A 
shift in saturation from .30 to .45 represents a shift of about ten per 
cent of the variance of a test. With the improvement of tests it should 
be possible to reduce their complexity so that a higher and higher pro- 
portion of their variance is attributable to a single primary factor. 
All that we can expect in the present state of knowledge is to identify 
the principal landmarks among the human abilities. There is some 
satisfaction in finding that the primary factors in a simple configura- 
tion determined by one population are essentially the same as those 
found in another population. 

Column N has the highest saturations in Addition, Division, High- 
est Number, Identical Numbers, and Multiplication. This is evidently 
the number factor N that was found in the previous battery. Here, as 
before, the simple arithmetical processes carry the highest saturations 
in this factor. 

Column S has the highest saturations in Areas, Designs, Flags, 
Identical Forms, and Pursuit. All but one of these tests were used in 
the previous battery, and they characterize the visual space factor S. 
It is not surprising to find the new test Designs in this list. We had 
expected to find a spatial component in this test, but it was inserted in 
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the new battery to determine whether it would also appear among 
the perceptual tests. The Identical Forms made a shift, a reduction of 
its perceptual component and an increase in its spatial component. It 
had an appreciable saturation in both of these factors in the previous 
study, and it retained both factors in the present battery. The angular 
displacement of the test vector for Identical Forms in the plane of 
these two factors, Perception and Space, is about 28 degrees. These 
shifts may be due in part to a shift in the abilities that the subjects 
used in doing the tasks. The former group was the most highly select- 
ed group of subjects that the author has ever worked with. The pres- 
ent group was a class of seniors in a vocational high school. Consider- 
able work with individual subjects will be required to ascertain wheth- 
er some of these tasks can be performed by the vicarious functioning 
of one ability for another ability that is normal for the task. In the 
present case the less gifted subjects relied more on visual imagery for 
a task that seemed to be more immediate and perceptual for the gifted 
subjects. This shift in the abilities used for any particular perform- 
ance may be expected also within any group of subjects. 

When a test shows saturation with two or more factors we have 
no means of knowing by factorial analysis whether the several abili- 
ties enter into the test for every subject, or whether some subjects use 
one ability and other subjects use other abilities for the same per- 
formance. A study with individual subjects could reveal these dif- 
ferences, especially when the subjects indicate how they solve each 
problem. One solution for this ambiguity is to develop tests which in- 
volve only one factor. Since it seems desirable to work toward tests 
which involve mostly one primary ability and very little of the others, 
we may eventually have test batteries that ave highly specialized as 
to the functions involved so that individual differences will be con- 
spicuous. 

Column M has only two tests with appreciable saturations. These 
are the two memory tests so that the identification is evidently the 
memory factor M. No attempt was made in this study to analyze the 
memory factor. That will be reserved for future experiments. 

Column W has appreciable saturations in three tests, namely, 
Anagrams, Disarranged Words, and Identical Names. Two of these 
tests were retained for this battery to represent the Word factor W, 
and the other test was one of the new ones. The new test fits the 
previous description of this factor. Both verbal factors appeared in 
this battery as in the previous one. The psychological differentiation 
between these two verbal factors needs considerable further study. 

Column 7 does not have many large saturations, and it is neces- 
sary to consider tests with low saturations in this factor in order to 
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make a tentative interpretation. This includes the loadings higher 
than .30. The list then includes Areas, Arithmetical Reasoning, 
Reasoning, Tabular Completion, and Verbal Analogies. These tests all 
involve reasoning, and the common factor seems to be inductive. This 
factor has therefore been identified as Induction. An experiment is 
now in progress with ten new tests of induction in addition to those 
here listed. The resulting analysis should reveal with more certainty 
the nature of this primary mental ability. 

Column P has high saturations for all of the new tests except De- 
signs, and this factor has, therefore, been identified as perceptual. 
The new tests were designed for this battery in order to determine 
whether they would have appreciable projections on the same primary 
vector. This has happened so that the uniqueness of this perceptual 
factor seems quite certain. The previous battery did not have several 
good tests for this factor. The new tests were designed so as to ac- 
centuate the perceptual factor if it existed. The Identical Forms 
shifted toward the visual space factor, and the Word Grouping shifted 
toward the verbal factor V, so that the identification of the new per- 
ceptual factor with the one previously suspected is not clear. How- 
ever, it does seem clear that the new tests introduced into this experi- 
ment are closely related by a conspicuous common factor. This factor 
we shall denote P. 

The interpretation of the factor P will be aided by comparing the 
relative saturations of the nine new tests with this factor. The highest 
saturations are found in Concrete Association and in Verbal Enumera- 
tion. The third test in this sequence was Abstract Classification which 
was designed so as to be similar in character but with more abstract 
material. The ratings of these three tests as regards the perceptual 
factor is as we had expected. The two simple ones rank highest while 
the more abstract form of the test is less satisfactory as an index of 
this factor. 

The three tests Highest Number, Identical Number, and Identical 
Names rank next, since about half of their common factor variance is 
accounted for by the perceptual factor. The fact that some of these 
tests are numerical and others verbal in immediate content does not 
affect their saturation with the perceptual factor. The Scattered X’s 
was thought to be a simple task which should have a high saturation 
with the factor P, but it does not rank so high in this factor as was 
anticipated. The common factor in these tests may be fluency of as- 
sociation with perceptual material. Visual acuity is probably not in- 
volved in this factor. It is probable that this factor is of considerable 
significance in determining the speed of reading, and it may be in- 
volved in reading disabilities. Further experimental study should be 
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made with the present battery augmented by new tests of visual dis- 
crimination with liminal and with supraliminal discriminations, vari- 
ous reading tests, communality of association, tests involving the 
identification of a designated object with varying degrees of percep- 
tual distraction in the same modality and in different modalities, and 
tests of visual acuity. Such investigations will delimit each factor, 
and they will probably disclose new ones. 

Another column shows only two significant saturations, namely, 
the two scores for the Designs test. This column is denoted “‘Doublet 
Designs.” This is not sufficient variety of test material to identify this 
factor. The test does not belong with the factor P. It might be of some 
significance that the number of right responses in unit time and the 
ratio of the right responses to the total number of attempts both have 
high saturations on this factor. We regard this factor as a doublet of 
unknown psychological nature. One factorial column has appreciable 
saturations in the four ratio scores. It is denoted “Ratios.” The fac- 
tor involved may be concerned with Accuracy or Caution. More data 
should be available on a variety of test material for this factor before 
it can be identified. This finding does suggest, however, that the rela- 
tive frequency of errors may represent a unique trait. 

In the interpretation of factorial analyses the assumption of lin- 
earity is an important limitation. It is unlikely that the mental abil- 
ities combine linearly except as a first approximation. Consequently 
the large saturations of the tests should be studied for the purpose of 
discovering the principal landmarks among the mental abilities and 
not with the hope that the exact factor loadings will remain over dif- 
ferent ages and selective conditions. The extension of factor analysis 
to second degree functions will remove this limitation. The psycholog- 
ical implications of second degree functions in factor theory will be 
discussed in a separate paper. 

One of the criticisms of factor analysis is that if similar tasks 
are inserted into a test battery, they will identify a common factor 
and that new factors can, therefore, be manufactured indefinitely. 
This does not happen. Several failures to verify such postulated fac- 
tors may serve to answer this form of criticism. 

In preparing the present battery of tests it was postulated that 
quickness in perceiving detail among distractors was a factor. All 
but one of the new tests did define a common factor; but one of them, 
Designs, failed to join the others. We had guessed wrong, at least 
in part, about the nature of this factor. It now seems that the per- 
ceptual unity of the design from which the detail, the >, had to be 
extracted moved this test to some other categories. This will lead to 
separate experiments in which the degree of perceptual unity which 
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hides or inhibits the object of search can be varied. This may again 
be a false lead, but a factorial analysis can answer the question. 

In preparing the fifty-six tests for a former test battery, the 
assumption was made tentatively that verbal reasoning, numerical 
reasoning, and space reasoning would be separate factors and that 
these would be different from verbal abstraction and visual imagery. 
Groups of tests were designed for these categories. The factor analy- 
sis demolished all of these predetermined groupings that had guided 
the test construction. The.analysis cut across these anticipated group- 
ings and revealed different factors. But these new factors have reap- 
peared in successive test batteries. It was assumed that visualizing 
flat space, visualizing solid space, and visualizing movement in solid 
space were different abilities. Groups of tests were constructed for 
these groupings. Factor analysis again cut across these groupings. 
The factorial methods will be most useful when they are applied to 
experiments specially designed to test psychological hypotheses. Mere- 
ly to apply factor analysis to any available correlation table is as fat- 
uous as any other manipulation of scientific tools without a motivat- 
ing idea. Under such conditions it can frequently be shown formally 
that the factor analysis even becomes indeterminate. 

The problem of orthogonality is of peculiar psychological impor- 
tance. Should we assume that the primary human abilities are uncor- 
related (orthogonal), or should we assume that they are correlated 
(oblique) ? We should do neither. When the correlational matrix has 
been factored by the centroid method, or by any other equivalent 
method, the relations between the tests are known within the restric- 
tions of linearity in the smallest possible dimensionality. The coordi- 
nate axes constitute merely an arbitrary orthogonal reference frame. 
A new set of coordinate axes must then be found that is psycholog- 
ically significant. Each axis should represent an ability or faculty. 
These may be determined if the tests show a simple configuration. 
Then we can ascertain the intercorrelations of the primary abilities. 
The analytical methods do not impose either restriction. 

We might be tempted to take for granted that the primary abil- 
ities are and should be uncorrelated, but in the present state of knowl- 
edge such an assumption is not safe. So far we have found the pri- 
mary abilities to be practically uncorrelated. When the primaries are 
determined independently, there seems to be a consistent tendency for 
them to be slightly positively correlated. But the results are not yet 
sufficiently conclusive to justify a declaration about slight positive 
correlation. In two experiments there was some indication that the 
number factor and the space factor were correlated to the extent of 
about .20, while the other primaries had correlations between zero 
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and .10. These are of the order of magnitude that would be expected 
by chance variation, so that the finding is not conclusive. We have 
made adjustment by choosing the nearest orthogonal reference frame. 

It may be useful to consider a case, perhaps fictitious and per- 
haps real, in which the primaries could be correlated. Let it be 
assumed that there are individual differences, not only in the primary 
abilities of adults, but also in the rate of mental development in child- 
hood. Two children of like age might then differ in mental! develop- 
ment even if they were destined to attain the same mentality as adults. 
It is unlikely that all individuals develop mentally at the same rate 
relative to their adult levels. It might be found that ten-year-old chil- 
dren of accelerated mental growth have most of their mental abilities 
more developed than ten-year-old children of slower mental growth. 
Such a situation would result in positive correlation between the pri- 
mary abilities, due to maturation, even though these abilities would 
be uncorrelated in the same population when the individuals become 
adults. The fact that the correlation could be explained would not 
make it spurious. The primary abilities could be redefined in terms 
of predicted adult performance if some independent measure of men- 
tal growth were available. If such a measure were not available, the 
rank of the system would be lower by one, and the primary factors 
would appear correlated. This problematic case is described here 
merely to show one of several situations in which primary mental 
abilities might be positively correlated even at point age. 

In current discussion of factor analysis there is frequent refer- 
ence to factors that are called “mathematical” as distinguished from 
factors that are “real” and psychologically meaningful. It should be 
clear that, as psychologists, we are not interested in mathematical 
artifacts. Factor analysis can justify itself in experimental psychol- 
ogy only in so far as it aids in the discovery of psychologically sig- 
nificant categories. It is a source of considerable satisfaction to dis- 
cover that different test batteries with different populations reveal 
the same psychological factors. These are not artifacts. It is unlikely 
that the grouping of tasks involving numerical, visual, verbal, and in- 
ductive thinking and memory appears consistently as a mathematical 
artifact in different populations and in different test batteries. To 
see these same verbal, numerical, spatial, and memory factors roll 
out of successive test batteries, even when the tests are identified 
only by code numbers, leads to the conviction that they are basic men- 
tal abilities, human faculties, rather than artifacts. 
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TABLE 1 
Distributions of Scores in Twenty-Seven Tests 
to 215 Seniors at Lane Technical High School 











Code Scoring Standard Reliability 

No. Name of Test Formula Mean Deviation Q, Med. Q, Coefficient 
R 

61 Abstract Classification + 44.80 11.53 87.26 44.64 53.29 -96 

62 Abstract Classification cw 24.45 13.59 14.40 21.94 33.04 94 

31 Addition R 9.70 8.72 7.65 9.89 12.54 .88 

15 Anagrams R 13.46 4.70 10.55 13.58 16.75 

29 Areas R 19.33 5.18 15.65 20.46 23.85 95 

89 Arithmetical Reasoning R 3.83 2.63 2.32 3.96 5.97 61 

11 Completion R 21.32 6.46 17.84 21.22 26.47 90 

63 Concrete Association = 59.76 9.69 538.42 61.47 66.98 .96 

65 Designs R 59.27 15.21 52.69 60.87 70.78 88 
0 

66 Designs 17.30 14.64 8.09 14.02 22.21 86 

(FR +0) 
12 Disarranged Words R 38.70 9.24 83.81 38.88 45.50 .86 
34 Division R 8.03 3.81 5.62 8.16 11.02 87 


20 Flags (R—W+38) 25.23 11.60 18.30 25.67 33.33 92 


67 Highest Number R 42.84 9.67 86.68 41.75 49.71 94 
Ww 

68 Highest Number —_—— 20.94 138.69 11.04 17.84 30.07 85 

(R + W) 

26 Identical Forms R 32.88 4.82 29.98 33.56 36.11 92 

69 Identical Names R 28.79 Beek | Sages ee ee 95 

70 Identical Numbers R 60.77 were basen Meer 97 

47 Initials R 5.33 3.82 8.30 5.50 8.07 84 

10 Inventive Opposites R 26.84 7.89 21.86 28.14 338.25 .92 
R 

71 Letter A + 53.00 16.87 41.738 52.08 62.52 95 

83 Multiplication R 12.04 4.98 8.70 12.28 16.36 89 

27 Pursuit R 46.29 8.15 41.55 46.94 51.70 99 

40 Reasoning (R—W-+10) 13.60 6.49 9.37 18.74 17.96 50 
R 

73 Scattered X’s 4 49.64 9.93 42.29 49.98 56.42 98 

74 Scattered X’s : 7.08 5.22 8.91 6.75 9.85 73 

(R+-0) 

85 Tabular Completion R 21.27 7.89 16.28 21.61 27.68 .98 

41 Verbal Analogies R 20.46 8.45 15.97 20.77 25.81 91 
R 

75 Verbal Enumeration 7 45.00 7.53 41.47 45.91 50.66 91 

7 Word Grouping R 86.88 8.58 31.28 36.87 48.04 91 

46 Word-Number R 4,49 3.30 2.61 4.54 6.66 80 
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TABLE 8 
Centroid Matrix 











I II III IV Vv VI VII VII Ix x 

61 69 36 —.21 17 —03 —.04 —.16 19 17 Al 
—62 59 33 09 —.18 16 27 —.10 .06 19 .08 
31 51 —35 —27 —34 —15 —.11 .09 17 12 .03 
15 55 20 —.27 —14 —12 21 12 —.19 14 —.15 
29 39 —.24 39 10 05 —07 —.06 —.11 —.10 16 
39 .56 16 28 —22 —16 —.19 22 —05 —14 —.06 
11 57 65 —.01 .08 02 .10 09 —.03 —.04 08 
63 60 18 —.37 .26 16 —18 —.18 15 —.10 .03 
65 35 —.10 26 15 —.37 25 —.28 28 —25 —.17 
—66 28 —.06 46 —17 —.10 43° —.27 23 —18 —.10 
12 54. .28 —.24 09 —.11 .28 09 —.18 06 —.08 
34 62 —22 —14 —31 —19 —.24 .10 12 —.06 04 
20 41 21 25 06 —23 —.08 —.04 —.08 17 —.11 
67 55 —48 —.26 —.07 10 —22 —18 —16 —.03 —.14 
—68 36 —.33 14 —.35 36 —.08 —.10 —.27 12 —.04 
26 42 —.27 16 30 —08 —13 —14 —.08 10 05 
69 51 —32 —.33 13 29 15 06 —11 —.15 .08 
70 58 —46 —.27 —.04 18 07 —10 —02 —.06 —.09 
AT 40 —.15 05 21 13 .08 32 15 15 .10 
10 46 57 —04 —12 —.04 14 —.07 07 .03 14 
71 45 —.30 —.25 08 —.03 12 09 —.10 11 —.13 
33 48 —.28 —.24 —.35 —.23 —.16 21 18 .06 10 
27 33 —.26 15 29 —.25 —.22 —.06 —.27 06 —.08 
40 31 22 26 —10 —.06 05 06 —.10 —.16 15 
73 384 —35 —.14 31 —.08 —04 —09 —17 —.10 18 
—T4 24 —.16 16 —.07 25 18 —11 —.10 18 13 
35 62 14 18 —18 —.07 —.33 06 —.06 —.14 03 
Al 48 43 27 —.09 08 —.09 —.02 07 —.06 —.10 
75 59 18 —.29 26 23 —11 —.06 05 —.20 —.09 

7 60 48 .08 09 16 —.06 —.03 13 07 03 

46 385 —.04 12 12 14 —.06 36 16 06 —.17 
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TABLE 4 
Rotated Factorial Matrix 














Code 


Name of Test No. 





Abstract Classification 61 


“ R 62 
R+0 
Addition $1 
Anagrams 15 
Areas 29 
Arithmetical Reasoning 39 
Completion 11 
Concrete Association 63 
Designs 65 
a 66 
R+0 
Disarranged Words 12 
Division 34 
Flags 20 
Highest Number 67 
ae: 68 
R+W 
Identical Forms 26 
Identical Names 69 
Identical Numbers 70 
Initials 47 
Inventive Opposites 10 
Letter A 71 
Multiplication 33 
Pursuit 27 
Reasoning 40 
Scattered X’s 73 
“ a ref | 
R+0 
Tabular Completion 85 
Verbal Analogies 41 
Verbal Enumerations 75 


Word Grouping 7 
Word Number 46 


V 





Ratios 
Doublet 
Designs 


N W Ss M 


Residual 


he 











674 
575 


001 
473 
-.046 
360 
-754 
495 
087 


069 


499 
121 
416 
-.002 


-.008 


-030 
-.011 
-.009 

063 

644 

046 
-.009 
-.002 

.230 
-.145 


010 


393 
599 
484 
-708 
.178 


-.116 .362 
059 —.034 


-.087 .242 
-.008 .013 
283 .172 
482 —.021 
.253 .037 
002 .668 
-.041 .172 


094 -.106 


-.077 .081 
197 .290 
033 -.057 
015 .583 


-198 .108 


-.045 .320 
113 .526 
-.002 .540 
053 .158 
-110 -.045 
-.1138 .283 
066 .172 
000 .227 
363 —.103 
-.009 .437 


000 .006 


A487 .199 
318 .024 


148 .613 -.060 .009 -.034 


185 
123 


167 
170 


015 -.059 
411 .094 


054 —.004 
021 .018 
824 .038 
-.015 .053 
-.016 -.038 
—.068 -.060 
004 .659 


882 .674 


-.013 .044 
-.014 .007 
012 .033 
190 —.016 


-568 —.089 


152 -.025 
.230 -.015 
279 185 
-105 -.072 
077 .024 
070 = .035 
-.055 -.056 
-.030 -.065 
128 .047 
061 -.075 


499 -.017 


022 -.062 
093 .107 


101 -.034 


-.038 .019 


082 .014 .082 .024 .319 
113 .087 -.016 .064 .231 


-727 =.016 -.033 .031 
3805 .441 
-.011 -.070 .421 
3825 -.012 .271 .091 


019 .042 -.075 -.012 
016 -.052 .385 -.008 


.042 -.084 .200 .012 


071 .494 .101 .015 .102 
671 -.025 .088 —.015 .086 
048 -.049 .898 .053 -.042 
413 .083 .074 .088 -.300 


.303 -.024 -.008 -.006 -.331 


-.006 -.049 .472 .088 -.044 
138 .409 -.070 .100 .044 
362 .219 -027 .052 -.103 
089 .1238 .126 .476 .190 
041 .093 -.029 —107 .347 
275 .381 .107 .148 —.096 
-740 .020 -.001 .040 .161 
018 .038 .576 .008 -.225 
011 .068 .205 -.050 .173 
008 .206 .3828 -.072 .055 


016 .035 .067 .069 .006 


.272 -.093 .258 -.008 .017 


016 -.124 .086 .092 .051 
163 -.078 .076 .047 
-.048 -.053 .045 .155 .205 

117 + .025 .053 .512 -.028 


077 
020 -.008 -.010 
076 .022 
011 
-.094 .225 .063 .042 .296 
161 
168 


158 
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— 

n = 
V I P ‘3 a N WwW Ss M Res. 

3 i) 

a=] A 
I 518 .220 .414 .2384 .092 .846 .174 .276 8 .185 .123 
II 773 .1838 -.285 -.239 -.112 -.887 .002 -.174 -.131  .210 
III | -.045 .858 -408 .860 .230 -.254 -—398 .526 .259 -.061 
IV .019 -.2387 .406 -—273 -.070 -.701 .184 .406 .317 .070 
V 080 .216 .265 .518 -—190 -.282 -.067 -.561 .805 -.181 
VI | -067 -—207 —882 .862 .518 -.172 .587 -.072 .080 .291 
VII | -.121 .891 -.296 -—359 -.271 .217 .862 -.061 .684 .095 
VIII 019 -.224 .180 -.185 .380 .187 -.581 -3825 .894 .488 
IX 184 -.664 -—386 .282 -.410 .124 -.173 .090 .271 -.113 
X | —267 .189 -.047 .268 -.481 -.062 -.0384 .117 -.262 .744 

TABLE 6 


Correlations R,, Between Primary Abilities in the Experimental Population 











2 2 
Vv I P $ 3 N Ww S M 

3 °o 

fs A 
V| 1.000 —.025 —.038  .058 .004  .092 .014 .069 —.024 
I|}—.025 1.000 —.015 —.044  .039 —.051 —.108 —.036 —.030 
P|}—.0388 —.015 1.000 108 —.043 .052 .076 .092 .040 
Ratios | .058 —.044 108 1.000 —.053 .064 .060 022 .062 
Doublet| .004  .039 —.043 —.053 1.000 —.026 —.030 --.017 —.049 
N| .092 —.051 .052  .064 —.026 1.000 .066 .176 .047 
W!] .014 —108 076 .060 —.030 .066 1.000 .008 .041 
S| .069 —.036 .092  .022 —.017 .176 .008 1.000 .085 
M |—.024 —.030 .040 .062 —.049 .047 .041 .085 1.000 

















ANNOUNCEMENT 


Two research assistantships in experimental psychology are avail- 
able for the next academic yewr. In addition to psychological training, 
the desirable qualifications are: 1) facility in handling apparatus, 2) 
training in basic science and in mathematics. The duties will be to 
assist in conducting an experimental and factorial investigation in the 
field of perception. Opportunity will be given for graduate study. 
Applications should be in writing. 


L. L. THURSTONE, 
The University of Chicago. 
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DISCUSSION OF A SET OF POINTS IN TERMS OF 
THEIR MUTUAL DISTANCES* 


GALE YOUNG AND A. S. HOUSEHOLDER 
The University of Chicago 


Necessary and sufficient conditions are given for a set of num- 
bers to be the mutual distances of a set of real points in Euclidean 
space, and matrices are found whose ranks determine the dimen- 
sion of the smallest Euclidean space containing such points. Methods 
are indicated for determining the configuration of these points, and 
ye approximating to them by points in a space of lower dimension- 
ality. 


Ordinarily a set of points is specified by giving its coordinates in 
a suitable reference system; and the dimensionality of the set, the 
problem of approximating it by a lower dimensional set, etc., can be 
discussed in terms of these coordinates. It may be, however, that 
only the distances of the points from each other are known, and it is 
desired to give a similar discussion on this basis. 

Consider a set of » points, and let a; = 1 --- n—1, be the vector 
from point 7 to point 7. Let ai; be the component of a; along the j-th 
axis of an orthogonal coordinate system with origin at point n and 
let A denote the matrix (ai;). The dimensionality of the point set is 
equal to the rank of A and to the rank of B = AA’. The elements of 
B are given by bi; = ai -a;. The vector from point 7 to point 7 is 
Vij = a; — ai, and by taking the scalar product of each side with it- 
self there results the familiar ‘cosine law’: 


d?;; = A? jn aa d? in —2 airaj, 
where d;; is the distance between points 7 and 7. From this it follows 
at once that 

bij = (Pin + Pin — @5;) /2, (1) 
so that AA’ is expressible in terms of the mutual distances only. Thus 

(1) The dimensionality of a set of points with mutual distances 

d;; is equal to the rank of the n—1 square matrix B whose elements 
are defined by (1). 


A matrix first given by Cayley in 1841, and involving the points 


in more symmetric fashion, may be used in place of the matrix B. 
*This paper was written in response to suggestions by Harold Gulliksen and 
by M. W. Richardson. The latter is working on a psychophysical problem in 


which the dimensionality of a set of points whose mutual distances are available 
is a central idea. 


= 
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First, the matrix C = —2B has evidently the same rank r as B. Bor- 
der this to obtain the 7 ++ 1 square matrix 
ws 0 1 
D=!0 0 1}. 
| 1 1 0 





Since C is symmetric it has a non-vanishing r * r principal minor 
M,. The determinant of the minor 


has for its value 

|s| pat | M, | i 1 0 
as we see by a Laplace expansion. Hence the matrix D has rank 
r+ 2 at least. If D had a greater rank there would be some minor 
M,., of order r + 1 of C for which 


0 i 
J=— 1a 





M,. © 1 
0 0 1 
ee a 


is non-singular. But since M,., is singular, the determinant of this 
matrix is — | M,., | = 0. Hence the rank of D is exactly r + 2. 
Now perform the following operations on D: 


a) Torowt,1=1.---n—l1, add d?i,n K rown+1. 
b) Tocolumnj,j =1---n—1, add d?;,< column7+1. 


These are so-called elementary transformations, and do not change 
the rank of a matrix. The result is 


0 d* 15 ‘y d in 1 
d?,, QO +++ da, 1 
tnd STORE OTE Te ; (2) 
a dno a 0 1 
: = - 10 








Thus 
(II) The dimensionality of a set of points with mutual distances 
d;; is less than the rank of the n+-1 square matrix F given by (2). 
Consider next the conditions under which a set of numbers 
d,; = d;; can be the mutual distances of a set of real points in Eucli- 
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dean space. It is evident to begin with that if such a set of points 
exists then any other such set defines a figure which is congruent 
(or symmetric) to the first. Moreover, if we form the matrix B whose 
elements are defined by Equation (1), then necessarily B is symmetric, 
and is equal to the product AA’ of the matrix A of coordinates of 
these points, in any coordinate system with origin at point n, by the 
transpose of this matrix. Hence B is positive semi-definite. 

Conversely, if B is positive semi-definite, then such points do 
exist, and to show this we need only exhibit a matrix A. Since B 
then has only positive or zero latent roots, there exists an orthogonal 
matrix « such that 


=¢L?¢' = (cL) (cL)’, (3) 


where 
L? = [7?:, A? +++ 27,,0--- 0] (4) 


is a diagonal matrix of the latent roots of B. Hence we may take 


A=cL, (5) 


and the theorem is proved: 


(III) A necessary and sufficient condition for a set of numbers 
d;; = d;; to be the mutual distances of a real set of points in Euclidean 
space is that the matrix B whose elements are defined by equation 
(1) be positive semi-definite; and in this case the set of points is 
unique apart from a Euclidean transformation. 

For the case n = 3, the condition that B be positive semi-definite 
is equivalent to the familiar triangle law: that each side of a triangle 
be less than, or equal to, the sum of the other two. In general, the 
positiveness of the determinant of the 2 * 2 principal minor on rows 
Z and j gives the triangle relation on d;,, d;, and d;;; while the corre- 
sponding requirement on the larger principal minors gives an exten- 
sion of this law. The present problem of determining a matrix A 
which specifies the configuration of the set of points is merely a gen- 
eralization of the familiar trigonometry problem of finding a tri- 
angle when the lengths of its sides are given. 

For the actual factorization of the matrix B we may refer to a 
method given by Thurstone*. Methods of fitting a lower dimensional 
set of points to a given set are also available,} so that the complete 
analysis of a set of points is possible given the mutual distances only. 


*Thurstone, L. L., Vectors of Mind, Chicago: University of Chicago Press, 


p. 78. 
+Eckart, Carl, and Young, Gale, “The Approximation of One Matrix by An- 
other of Lower Rank,” Psychometrika, 1936, 1, 211-218. 
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WEIGHTING SYSTEMS FOR LINEAR FUNCTIONS OF 
CORRELATED VARIABLES WHEN THERE IS 
NO DEPENDENT VARIABLE 


S. S. WILKS 


Princeton University 


When ro criterion variable is available, the combination of tests 
or other variables by the use of multiple correlation is not possible. 
Three methods of combining variables are described mathematically, 
and discussed with reference to the linear combination of tests. 
Iterative computational schemes are outlined and illustrated. 


1. Introductory remarks 

The problem of weighting or assigning coefficients in a linear 
function of several statistical variables has been discussed by many 
authors from the classical point of view of least squares. To indicate 
the point of departure for this paper it is perhaps worthwhile to 
make a few remarks about weighting introduced by the method of 
least squares and the type of problems to which the method applies. 
The problems which will be considered have arisen more or less di- 
rectly in connection with tests and therefore it will be convenient to 
state and discuss the problems in terms of tests. The generality of 
the results can be easily abstracted and probably will extend to other 
fields than testing, for example, that of constructing business index 
numbers. 

To discuss the least squares type of problem somewhat precisely, 
let y be a dependent variable and 2,, x2 --+ %, independent variables. 
In certain kinds of tests, particularly those in very narrow and ob- 
jective fields of activity, y may be regarded as a variable which di- 
rectly measures ability in a particular field of activity, say D, and the 
x’s may be taken as measures of characteristics associated with D. 
Each of several individuals in a sample from a certain population P 
is measured on each of the n + 1 variables y, 2, 72, --- %,, the values 
for the a-th individual being Ya, 10, Yea «++ Yna. Adopting a linear 
function of the «’s, say L = f,a, + f.v, + +--+ 6,2, for obtaining 
estimates of y for each individual, we now want to determine the /’s 
so that in some sense, L is the “best” linear function of the x’s for 
estimating the y’s of the individuals from P. If we define “best”? here 
in the sense of least squares, we then seek values of the f’s so that 
2 (ya — La)? is a minimum. It is well known that the values of the 


—23— 
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B’s which minimize the sum of squares are the regression coefficients 
Dor-23+een» Dooei34eeen» *** Whose explicit expressions need not be given here. 
(To fit into the notation y would be regarded as a variable x,). It is 
also a commonplace fact that the linear function L with coefficients 
thus obtained is more highly correlated with y than any other linear 
function of x,, 2%, --: %,. The practical value of L with the least 
squares coefficients lies in its use for prediction purposes, that is, for 
estimating values of the y for those individuals from P for which 
values of the x’s are available, but values of y are not. 

In using the method of least squares an essential part of the 
situation is the availability of y’s for some of the individuals. How- 
ever, in the construction of tests, especially those in mental and other 
less objective activities, where ability is measurable, if at all, only in 
an indirect or a roundabout manner through variables of the x type, 
values of y are not available. In such cases we must resort to other 
methods than that of multiple correlation for obtaining an index 
whose values can be reliably associated with the various degrees of 
ability. To be more specific, when such a domain of activity, D is 
under consideration for a population, P, of individuals, the usual pro- 
cedure for devising an index or score of it for each individual is to 
define a population Q, of directly measurable tasks or items associ- 
ated with ability in D, such that the performance of an individual 
from P on any one of them can be made to correspond with a value 
of a corresponding variable. A test Sp of n items is then selected from 
Q,. The performance of an individual from P on the test is then de- 
scribed by » variables %,, 22, --- %,, some or all of which could take on 
as few as two values 0, or 1, as, for example, when performance on 
an item is classified as “wrong” or “right.” Without loss of general- 
ity each item can be defined so that increasing values of the variable 
x are associated with increasing levels of ability in D. Thus, corre- 
sponding to each item in Q> is a variable which in turn has a distri- 
bution function determined by the responses of all individuals in P 
to the item. As far as the meaning of the phrase “levels of ability in 
D” is concerned it must be left as an undefined intuitive concept. 
After discussing the practical meaning of Theorem I, we shall give a 
pragmatic definition of the phrase. Finally, the index is taken as 
some function of %,, %., --- %,. It is customary to refer to a value of 
this index for an individual as a test or fallible score of the individ- 
ual. The individuals of the population P would then give rise to a dis- 
tribution of scores on a given test Sp which would be subject to fluc- 
tuations due to fallibility of scores, that is, due to finite n. A sample 
of N individuals from P would have its distribution which would be 
subject to sampling variations due to fallibility of scores and also due 
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to finite N. A score so constructed can have no intrinsic meaning, its 
value depending on its rank with respect to scores of other individ- 
uals in the sample or more generally in the population. If the items 
used in the test are such that the order of the scores for several in- 
dividuals from P can be made as highly stable in some sense, as we 
please, by taking a sufficiently large number n, then there is reason 
for adopting the score as a device for ordering individuals in P with 
respect to ability in D. The scores in the order they would appear 
ultimately for increasing values of n are referred to as true scores. 
The degree of stability of the order of individuals established by a 
test is known as the reliability of the test. Various schemes have been 
devised for measuring reliability quantitatively. 

The simplest type of a function of the x’s to deal with, both the- 
oretically and practically, is a linear one. The purpose of this paper 
is to discuss certain properties of linear functions of the x’s when n 
is large and consider methods of determining “best” linear functions 
when v is small, according to each of several definitions of “best”. 


2. Linear functions of a large number of x’s. 


Here we shall consider the stability of the ordering of individ- 
uals in a sample, using linear functions, by examining the correlation 
coefficient between two linear functions with different coefficients. 

Let 21, %2, +++ %, be correlated variables each expressed in units 
of its standard deviation. Let k,, k., --- k, and l,, l., --- l, be two sets 
of coefficients. Consider the two linear functions 


L, = Feyaty Lb Kegtty +--+ + Bn dn , 
L, = 1,1 + La + +--+ Inn . (1) 


Denoting the correlation between, x, and x, by 7;;, (7i; = 1) the cor- 
relation between L, and L, is 
Sklir ij 
0.7 


R= ; 
V (Shik sris) (20b7%5) 





(2) 





Now suppose we consider k, --- k, as numbers drawn at random 
from a population in which the first and second moments about the 
origin are a, and a. Similarly, 1, --- ], are considered as randomly 
drawn numbers from a population in which the first and second mo- 
ments about the origin are b, and b.. Suppose the k’s, l’s and r’s are 
mutually independent. Writing R as 


Skil; + Zkiliris 


= , (3) 
V (Shi + Shik ris) (2Pi + SUG ri) 
4 isi 4 izj 
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and using the Law of Large Numbers, we can replace 2kil; by 


6, : RS : ; 
n (a,b, — where 4, is a statistical variable which has a zero 
n 
mean value and a variance independent of n (except possibly for the 


trivial case of added terms of order re Similarly, Sk(l;7;; can be 
nN i#j 
de 


replaced by (n?—n) (a,bir + 
Vn'—n 


the 7’s, 6. is a variable having zero mean value and a variance inde- 
pendent of n. The remaining sums in (3) can be replaced by similar 





expressions. If R then be expanded into a series in — which can 
n 
be done by choosing a sufficiently large n, we have 
= 1 A, be 
R=1+5(2—F—pte jets ‘te che (4) 


where ¢, and «, are variables having zero mean values and variances 
independent of n, and e; is such that its variance and mean value do 
not depend on ». From (4) it will be seen that the mean value of R 





(neglecting terms of order Pt and higher) is 


1 Ay be ea 
or are et ee: Sa(eat .)> 


(where o;* and o;? are the variances of the k’s and l’s), and the vari- 
ance of R is proportional to - Thus, as n increases, the distribu- 


tion of R rapidly “piles up” at the upper end of the range of R. R 
stochastically converges to 1 as n increases. It will be seen that in 
order for this “‘piling-up” effect to occur, 7, a, and a, must be differ- 
ent from zero. It can be shown from the fact that the correlation 
matrix || 7; || is positive definite that 7 cannot be negative for an in- 
definitely large number of intercorrelated variables, for we must have 
27:40:03 > 0 for all values of the 6’s. Thus 6?[n + (n?—n)r] > 0, or 


r>— =" , which means that r cannot be negative for all values of 


n, i.e., an indefinitely large number of variables. The results regard- 
ing the mean value and variance of R hold somewhat more strongly 
in case the pairs of weights k;, i, i = 1, 2, --- , are correlated, there 
being no other dependence among the k’s, /’s and r’s. Another con- 
dition that must clearly hold is that the number of non-zero correla- 
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tions must be of order n*. (For example, n? — n is of order n?, while 
cn is not, where c is a constant). We briefly summarize the foregoing 
results in 

Theorem 1: Let 2, %2, +++ Xn be intercorrelated variates, each with 
unit variance, and let r;; be the correlation coefficient between x; and 
x;, the number of non-zero correlations being of order n?. Let k,, 
ke, +++ ky and l,, lo, -»- l, be regarded as variables such that if there is 
any probability dependence among the r’s, k’s and I’s it is between 
k’s and I’s. Let the mean value of the k’s and I’s be non-zero. Then 
the correlation R between linear functions L, and L, given by (1) ts 


distributed with variance proportional to 2 about a mean which dif- 


fers from 1 by terms of order = : 


The practical bearing of this Theorem on testing is fairly obvi- 
ous. In fact, 21, 22, ++ 2%» Will be variables for giving item-scores cor- 
responding to performance on n items from Q, for testing ability in 
D. The items for the test Sp are chosen so that they are mutually cor- 
related, or at least, so that the number of non-zero (actually positive) 
correlations is of order n?, that is, roughly proportional to the number 
of correlations. If we form any two linear functions (test scores) of 
these x’s, such as L, and Lz, being careful that the means of the k’s 
and /’s are not zero (which is almost always insured in practice by 
taking them as positive) and choosing them more or less randomly 
with respect to each other, then with increasing n (size of test) the 
L’s ultimately become perfectly correlated with each other, the vari- 


ance of R for large values of » being proportional to et In a long 


test of intercorrelated items, it matters very little how the individual 
items are weighted, thus showing that the relative order of scores for 
individuals from P tends to be stable, or invariant for different meth- 
ods of obtaining linear scores.* It is to be noted that unless the num- 
ber of positive correlations among the «’s is of order n?, then it is not 
possible to construct a linear score which would possess this type of 
stability in ordering the individuals taking the test. Although the 
fact that such an invariant ordering is possible does not definitely 
prove that the scores rank individuals with respect to what the ex- 
pert in D had in mind when he defined the population of items Qp, it 
appears that in order to be realistic we could define an estimate of 
the ability in D for an individual as corresponding to the value of a 

*Empirical evidence of this fact has been gathered and incorporated in a 


non-technical paper by J. M. Stalnaker entitled: “Weighting Questions in the 
Essay Type Examination”, Journal of Educational Psychology, 28, 1937. 
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linear function of x variables, which describe his performance on the 
test consisting of items selected from Qp. At least this appears to 
be a definition of ability in D with practical meaning since the values 
of the linear function for several individuals assume a stable order 
when a sufficiently large number of items is used. Such a scheme, 
however, does not provide what may be termed a “metric”, that is, a 
scale for comparing the difference between the performances of in- 
dividuals A and B with the difference between those of C and D. It 
only provides a method of ranking A, B, C, D, etc. It will be seen 
that the concepts of the two populations P and Qp play a very essen- 
tial part in the discussion because one of the main assumptions is 
that the properties of the 7’s are valid for performance of individuals 
from P on items from Qp, which in turn are defined by the expert in 
D as being associated with “ability in D”’. 

The ideal situation in establishing the order of individuals with 
respect to ability in D is to have them take a test with a large num- 
ber of items. From a practical point of view, however, it is not always 
possible to have a large number of items, in which case problems arise 
of (7) selecting a given number of “good” items from Qp, and (i) 
of finding the “best” linear function of the x’s for those items select- 
ed. Since the mean value of F is 


iy Ge | ai" 
Led Gates) 

it follows that for a given value of n, the larger the value of r the 
more nearly the mean value of RF is equal to unity, that is, the more 
stable will be the ranking of the individuals. This fact shows that if 
it is not practicable to select more than a certain fairly small number 
of items from Qp we cannot minimize the importance of selecting a 
representative set of items with relatively high values of the 7’s High 
values of the 7’s are obviously not sufficient because of the possibility 
that items having high 7’s may come from a very special sub-set of 
items in Qp, the set thus not being a representative sample from Qp. 
In this paper we shall not deal with the well-known problems arising 
in (7), that is those dealing with the selection and analysis of items, 
but shall pass on to those in (7). 

In many instances, the items in Q, are classified into various 
narrower sub-populations Qp,, Qp,. --- Qo,- Thus, the test is essen- 
tially a stratified sample from Qp, consisting of a set of sub-tests 
Sp,» So,» +++ Sp, each composed of a specified number of items from its 
respective population. The performance of an individual from P on 
test S,) would be spec’ ‘Sed by a value (score) of each of s variables 
21, 22, +++ 2s, 2; being the variable for the 7-th sub-test Sp, - The z’s, in 
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turn, are linear functions of the variables corresponding to the items 
entering into the respective sub-tests. In an extreme case we could 
regard each sub-test as consisting of one item, this including the case 
of a test composed of s items. It follows from Theorem 1 that by mak- 
ing the items in each sub-test sufficiently large we can rank the in- 
dividuals on each, of the sub-tests with as high a degree of reliability 
as we please. With only a small number of sub-tests, however, the 
problem still remains of linearly combining the sub-test scores to 
obtain the “best” composite or total score. We shall now consider this 
problem. 


3. Weighting by minimizing the generalized variance of all individ- 
uals receiving the same score. (Method A). 


For convenience, let (z; — mi)/o; = ti, where m; is the mean 
and o; the standard deviation of z; for all individuals in P. In 
practice, of course, mi, o; and the correlations among the z’s would 
have to be estimated from the sample of individuals taking the test. 
The performance of an individual on the composite test Sp can be 
represented geometrically by a point in an s— dimensional Euclidean 
space V, of the t’s. The population of individuals will be represented 
by a distribution function in the space. We shall simplify the prob- 
lem by assuming the distribution function to be normal. The results 
will hold, for practical purposes, for moderate departures from nor- 
mality. The distribution function will be constant on the “surface” 
of each of a one-parameter family of similar and similarly placed s— 
dimensional ellipsoids given by the equation 


PAistib; = 6, (5) 


where || Aj; || is the inverse of the matrix || 7;; ||, 7i; being the cor- 
relation between ¢;, and ¢;, and 7;; = 1. Let the principal axes of the 
ellipsoids be proportional to a, ds, --- @, where @, is assumed larger 
than any of the other a’s. Now, it can be shown that if all 7’s are 
positive, then the line coinciding with the largest axis extends through 
the origin into the generalized first quadrant, i.e., the one in which 
all coordinates are positive. 

Consider a linear function of the ts, say, T = k,t, + kt. + 
--. - k,t,. Those individuals who would receive the same value of T 
would then be represented by points in the t— space which lie on the 
“plane” T = .k,t, + ket. + --- + k,t,, which is an (s—1)-dimen- 
sional space V,-, normal to the direction k, : k, : --- k,. Our object 
now is to choose a set of coefficients k, --- k, so that the generalized 
variance of individuals in a plane having these coefficients will be 
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smaller than that of the individuals in a plane having any other set 
of coefficients. Generalized variance for several variables is an ana- 
logue of variance for one variable. The variance for one variable, it 
will be recalled, is the mean value of the squares of the lengths of the 
segments representing the deviations of the individuals from the 
mean of the distribution. In the case of two variables, if we form 
triangles with all possible pairs of individuals (each represented by 
a point) using the means of the two variables as coordinates of the 
third point in each case, square the area of each triangle and aver- 
age the squared areas for all pairs of individuals in the population, 
the result multiplied by (2!) is the generalized variance of the bi- 
variate distrubution. In the case of N dimensions, N individuals and 
the point representing the means of the N variables determine an 
N—dimensional simplex, i.e., an N—dimensional generalized tetra- 
hedron. If we form all possible such simplexes, square their volumes, 
and take the mean value of the squared volumes, multiplying by (7”!)? 
we get the generalized variance* of the N-variables. The value of the 
generalized variance is given by the determinant of the variances and 
covariances of the variables. The generalized variance is an index 
for indicating to what extent a distribution is concentrated in a space 
of a lower number of dimensions than the original space. For ex- 
ample, the ordinary variance indicates how highly the one-variable 
distribution is concentrated about a point; the smaller the variance 
the greater the concentration. The generalized variance for two vari- 
ables indicates how highly concentrated the distribution is in a line 
or a point, and so on. In the problem we are considering, it is in- 
tuitively evident that we want the ranking of individuals to be as 
stable as possible, that is, we want to choose a set of parallel “planes” 
in each of which the points representing individuals have maximum 
concentration. In other words, we want the generalized variance of 
the distribution of individuais receiving a given score to be a mini- 
mum. Although it is intuitively plausible that these planes would be 
perpendicular to the largest axis of the ellipsoids, we shall give a 
more substantial argument of the fact. 

If we cut across V, with the “plane” T = k,t, + kt, + ---+ k,t, 
we get a space V,.,. To find the generalized variance of the distri- 
bution in V,_, consider an orthogonal transformation of the form 


t; = 2a;;T;; i= 1,2...¢. (6) 
j 


The a’s are such that 2a?;; = 1, Sa; ;a.; = 0,i1+~ 7. If we solve equa- 
j j 


*Cf.—Wilks, S. S., “Certain Generalizations in the Analysis of Variance”, 
Biometrika, 24, 1933, 471-494. 











Ss. S. WILKS 31 


tions (6) for the 7’s, getting 
T; = ZDjjti , (7) 


then similar relationships hold among the b’s. Substituting (6) into 
(5) we get 


> B;;T;T; = 6@, (Bi; =2 biabjpAag)» (8) 
ij a,B 


as the equation of the family of ellipsoids in 7 — space. The points 
in the T — space for which one of the T’s, say T,, is constant is a 
space V,., of the kind we are interested in. Since we have used an 
orthogonal transformation which leaves distances unchanged, the gen- 
eralized variance of the distribution in the space for which T, = con- 
stant is simply the generalized variance of T., --- T, for fixed T,. This 
generalized variance is equal to the reciprocal of the determinant 
| Bi; |, 4.9 = 2, 3, --- s which by Sylvester’s theorem in determinants 
has the value B;, | Bi; |, i,7 = 1, 2, --- s, where B,, is the element in the 
first row and first column of the reciprocal of the matrix || Bi; ||. But 
the reciprocal of || Bi; || is the matrix of variances and covariances 
of the 7’s, and | Bi; | = | bi; |? | Ai; | and | Ai; | = aT B,, is thus 
ij 

the variance of 7, and is Zordeits; ; and | b,; | = 1. Hence the general- 
ized variance of T.,--+ T, for T, fixed is 

| Tis | 


2b, ib1;7 i; ; (9) 


U= 


which is. independent of 7. 

The problem now is to choose values of 0,;, Dy2, -»- bi, so that U 
is aminimum. We can see most easily geometrically what the answer 
will be. The b’s can be represented as coordinates of a point in an 
s-dimensional space on the unit sphere 20: = 1 with center at the 


origin. Now 2b,ib,;7ri; = gy? is a family of ellipsiods with centers at 
i.j 


the origin and the larger the value of ? the smaller the value of U. 
The largest of this family of ellipsoids which touches the unit sphere 
will furnish values of the b’s which minimize U. The largest ellipsoid 
is the one with smallest semi-axes equal to 1, there being, in this 
case, only one point lying on both the sphere and the ellipsoid. The 
coordinates of this point represent the direction of the smallest axis 
of the ellipsoid. Since || 7i; || = || Ai; ||“, we know from well-known 
properties of positive definite matrices that the lengths of the axes 
of the ellipsoids 2b,:b,;ri; = gy? are proportional to the reciprocals of 


i.j 
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the lengths of the respective axes of the ellipsoids 2b,;b,;Ai; = @ 
ij 


which are the same as those given by (5). Therefore we have proved, 
Theorem 2: The hyperplanes V,-_,, in the t-space V, in which the 
generalized variance is a minimum are those normal to the direction 
of the longest axis of the ellipsoids (5) of constant density. 
The “best” linear function of the ¢’s from the point of view of 
generalized variance is therefore 


T = p(yits + yoto +--+ yet) +m, (10) 


where y; : yo : --- : ys gives the direction of the largest axis of the 
ellipsoids (5) and p and m are arbitrary constants. In terms of the 
2's, 


T= (7 = tree tne em. (11) 


Hotelling* has given an iterative method of finding the direction of 
the largest axis from the matrix of correlations || 7;; ||. The scheme 
is as follows: We write 


y= Zyrry f=1,2,--8, (12) 


and start with a trial set of values of the y’s, which, in problems of 
weighting subtests can almost always be taken equal to 1. We then° 
get the values y’; = 27;;, 7 = 1, 2,--- s. Since we are only interested 


in any set of numbers proportional to the y’s it is convenient to “nor- 
malize” each y’; by expressing it as a multiple of the largest y’. We 


take the normalized values of y’; and substitute in (12) for the y; re- 
spectively, and repeat the normalizing process, until a stage is reached 
where the normalized values, agree, to the desired number of decimal 
places, with the ones obtained in the preceding step. Accuracy to 
three decimal places can ordinarily be obtained in 4 or 5 iterations. 
When the »’s are finally obtained they can all be multiplied by any 
convenient positive constant. Hotelling’s iterative method is appli- 
cable for all cases (i.e., whether 7’s are positive or negative) in which 
there is a largest ellipsoidal axis. The iterative process will break 
down only in the highly improbable case in which the initial set of y’s 
are those of a line through the origin perpendicular to the largest 
axis. It is convenient to refer to this method of obtaining weights as 


*H. Hotelling, “Analysis of a Complex of Statistical Variables into Princi- 
pal Components,” The Journal of Educational Psychology, 24, 1933, 429. 
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Method A. 
Illustrative Example 
In 1930, the verbal sections of the Scholastic Aptitude Test of 

the College Entrance Examination Board* consisted of three subtests, 
which were (1) Antonyms (z,), (2) Double Definitions (2), (3) 
Paragraph Reading (z;), and for 3318 boys taking the test, o, = 8.41, 
02 = 8.42, C3 = 8.48, Ty. = .8670, N13 = .8005, T23 = .8054. We set up 
the equations 

11 = yi + .867072 + .8005y; , 

yo = .86707; + yo + .8054y;, 

ys = .8005y; + .8054y. + y;. 
Taking y, = y2 = ys = 1 as trial values, we find the first normalized 


approximations to the correct y’s to be y, = .9979, y2 = 1.0000, y; = 


.9747. Using these in the foregoing equations for the y’s, we find 
v1 = .998, yo = 1.000, y; = .973. The fact that the y’s are so nearly 
equal is due to the 7’s being so nearly equal. Rounding to two deci- 


mals the “best” linear combination in the sense of minimum general- 
ized variance, is given by 


as ee ee 
os (aa tga t "S48 }+ ~ 


where p and m are arbitrary and can be chosen so that the mean and 
variance of T will have any desired values. An excellent approxima- 
tion to T for practical purposes in this example would be, of course, 
to consider the y’s equal, as was actually done in obtaining the total 
verbal score. 


When correlations are approximately equal. 


It should be pointed out that when all of the correlations among 
the subtests are equal and positive, then an ellipsoid in t-space will be 
long and cigar-shaped (s-dimensional, of course) having one long 
principal axis and s—1 small axes all equal in length. This follows 
from the fact the squares of the lengths of the principal axes are pro- 
portional to the roots of the characteristic equation | 7;; — 6i;4 | = 0, 
6; =1,7=jand0,i+~ 7. If the7’s are all equal to 7 say, the left- 
hand side of the characteristic equation is 


(1—r—-y) *7 (1—/-+_ (s—1) r) 


*Report on the Scholastic Aptitude Test 1980, College Entrance Examination 
Board, New York. 
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which has s—1 roots equal to 1—r and one root equal to 1 + (s—1)7r. 
The long axis is thus proportional to:\/ 1 + (s—1)7 and the small 
ones to \/1—7. Thus, the larger the number of subtests (i.e., the 
larger the value of s) and the larger the value of 7 the more needle- 
shaped the ellipsoids become, thus making the score T a more highly 
stable index for ranking individuals taking the test. If the correla- 
tions are not very different, so that the squares of the difference be- 
tween each 7 and mean of the 7’s is negligible, as is often the case in 
practice, the foregoing results hold approximately, when each corre- 
lation is replaced by the mean. In such a situation taking the y’s as 
equal and weighting the z’s by reciprocals of standard deviations is 
justifiable. Sizable variations in the 7’s may not only cause consider- 
able variations in the y’s but also in the relative lengths of the prin- 
cipal ellipsoidal axes. For example, the correlations may be such as 
to produce two large axes and s—2 small ones. This, however, raises 
the question of multiple factor analysis, which has been discussed by 
Hotelling,* Thurstone} and others. 

Other criteria which support the use of the y weights have been 
discussed by Horstit and also by Edgerton and Kolbe.'!| Horst has 
obtained the » weights by maximizing the variance of T = Skit; 


where 2k’; = 1, while Edgerton and Kolbe have obtained them by 





essentially minimizing the sum of the variances 2.07. for all indi- 
viduals, where o?, is the variance of the s quantities k,t,, k.t., --- k,t, 
for the a-th individual, and +;k?; is constant. 


4. Weighting by Equalizing Correlation Coefficients between each 
Subtest and the Total Score (Method B). 


Another system of weighting the t’s which appears to be of in- 
terest is that of choosing the k’s so that each ¢ is equally correlated 
with the total T = k,t, + k.t. + ----+ k,t,. Stated more precisely, 
we equate the correlations R; between t; and 7 (7 = 1, 2, --- s) which 
are given by 


~ eee, (13) 
Vrkikjri; 
i.j 


j 


*H. Hotelling, loc. cit., pp. 417-441, 498-520. ; ; 
+L. L. Thurstone, The Vectors of Mind, The University of Chicago Press, 


1935. 
tHorst, Paul, “Obtaining a Composite Measure from a Number of Different 
Measures of the Same Attribute,’ Psychometrika, 1, 1936, 53-60. 

||Edgerton, H. A. and Kolbe, Laverne E., “The Method of Minimum Varia- 
tion for the Combination of Criteria,” Psychometrika, 1, 1936, 183-188. 
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and solve for the k’s. This is equivalent to solving the equations 
Sikri; = p j=1,2,---8, (14) 


where p is any convenient constant different from zero. If the t’s are 
linearly independent, which will always be the case in testing, the 
linear equations (14) will have a unique solution which can be con- 
veniently found, for example, by the Dolittle method of solving nor- 
mal equations. 

If there are sufficiently strong reasons for having some of the 
subtests “count more heavily” in the sense of having them correlate 
more highly than others with the total score, then one would solve 
the linear equations 


DJikirij; = pe; , (15) 


where ¢, : C2: --: : €; is the desired ratio for R, : R. :--- : Rs, and 
the value of p is an arbitrary non-zero constant. 

As an illustrative example, consider the one used in Section 3, 
and consider the problem of finding the values of the k’s so that 
R, = R, = R,. Equations (14) become 


k, + .8670k, + .8005k; =p , 
.8670k, + ky + .8054k, =p , 
.8005k, + .8054h, +k, =p, 


from which we find that k, : k. : ks; = .77: .74: 1.00. For conveni- 
ence let the values of the k’s expressed in terms of the largest one, 
which satisfy equations (15) be denoted by @,, 62, --- 6;, and let the 
method of obtaining the 6 weights be referred to as Method B. Thus, 
in the present example, the 6 weights are .77, .74, 1.00. Therefore, 
from the point of view of equalized correlations between the total 
score and the subtest scores the “best” linear combination of the z’s 
for the total score is 


uae 21 Ze &3 
T= oT -++ 475 eg Mm. 


where p and m are arbitrary. It will be noticed that if the R’s are to 
be made equal, then equality of the correlations among the subtests 
implies equal @ weights. 


5. Weighting by Equalizing Increments in Variance of total score 
due to each of the subtests (Method C). 


Another basis for determining weights which suggests itself 
from a consideration of weighting non-correlated variables in linear 
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functions, is that of making the increments to the variance of the 
total scores, due to each of the several subtests, equal, or more gen- 
erally, proportional to preassigned numbers. For example, it may be 
desirable to decrease the scattering effects of some subtests on the 
variation of the composite scores and increase that of others, because 
of different a priori importance of the subtests. More precisely, the 
method can be stated thus: Let T = k,t, + kt, + --- + kt, be a 
linear function of the t’s. The variance of T for all variables is 


8 
or =2 kikjri; . (16) 
The variance of T omitting t¢, is 


Or —— kikjri; (17) 


i,j=2 
with similar meanings for 0772, o?7,3, ++: o*7,,5. Let 


Ay = op —o°ng = Ky + Shir; » (18) 


with similar meaning for A., A;, --- A,. It should be noticed that due 
to presence of correlation among the ?t’s the A’s are not mutually ex- 
clusive contributions in the sense that the sum of the A’s is equal to 
o”,. In case of independent t’s then A, + A, + ----++ A, = o?7. Now 
consider the problem of choosing values of the k’s so that the A’s are 
proportional to any set of positive numbers L, --- L, chosen in ad- 
vance, that is, 4; = oL; , 7 = 1, 2, --- s where o is a positive constant. 
For the present consider the case of equal A’s, that is, the L; = 1. 
We then have 


kj(23 kyri;—ki) =0 G=1,2,---8 (19) 
4 


Considered geometrically in generalized first quadrant of the s-dimen- 
sional space of the k’s each equation in (16) represents a certain 
“ruled” hyper-surface. For example, for 7 = 1, we get a “ruled” 
hyper-surface which is “tangent” to the (s—1)-dimensional space of 
k., --- k, for which k, = 0 in the infinite part of this subspace, and 
which “sweeps” down to the k, axis, cutting it in the point Q,: 
(Vp, 0, 0, --- 0) so that the surface is concaved outward from the 
origin. The “rulings” are such that we get a hyper plane for every 
constant value of k, ~ 0. Thus, for 7 = 1 the surface appears as a 
sort of a curved “roof” in the first generalized quadrant with vertex 
at Q,. The other surfaces (i.e., 7 = 2, 3, --- s) are similarly shaped 
“roof-surfaces” with their vertices on the respective axes. Therefore, 
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if the correlation among the ?t’s are positive it will be seen that for 
each value of @ the surfaces necessarily intersect in one point in the 
generalized first quadrant of the k-space, and as @ changes, this point 
will slide along a line radiating from the origin. The direction num- 
bers @, : @2 : -++ ws Of this line when used as weights for the subtest 
scores will equalize the A’s. If the correlations among subtests are all 
positive, then since each surface lies in the half space for which a 
corresponding k is positive, it follows that in no generalized quadrant, 
except the first, are there more than s—1 of the surfaces. Hence the 
solution is unique. By similar argument, it can be shown that for 
positive or negative values (non-zero) of the correlations there is one 
and only one solution to equations (19). The w’s, will, of course, not 
all be positive for the various cases. An easy practical way of find- 
ing the direction of the line in the k-space giving the solution is by 
an iterative method. It is sufficient to consider o = 1, and write from 
(22) 


o; = ; , j=1,2,---s. (20) 


25 wilis — w; 
i 





Now choose trial values of the w’s. In practice, where positive corre- 
lations are dealt with, it is always safe to take 


O, = =- =o =1. 
We then get an “improved” set of w’s from (20). The “improved” 
values are normalized, for convenience, by dividing by the largest 
one. The resulting normalized values are inserted in (20) for the w’s 
and new values of w’s obtained. The process is similar to that ex- 
pressed by the iteration transformation (12). The proof that this 
“iteration transformation” leads to a unique solution is rather tedi- 
ous and will be omitted. Thus, the “best” linear function of the z’s 
from the point of view of equalized increments of variance is 
@ 


T=7(atPat--+Sahem, 


01 02 
where the w’s satisfy the equations 
w;(2Siaitij; — oj) = 1, j =1,2,---8 


and o and m are any convenient constants. The method of obtaining 
w-weights will be referred to as Method C. 

In Method C as in the cases of Methods A and B, equality of cor- 
relationssamong subtests implies equality of w’s. 

As an illustrative example, consider the same one used in Section 
3. From (20) the iteration transformation for this example is 
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1 
ao,=—-— = — ¢ 
@, + 1.7340@.'+ 1.6010; 
i 1 
@2 = — — ——» 
1.7340@, + w, + 1.6108a; 
1 





@W2. = — — — © 
. 1.6010, + 1.6108a,. + os; 
Beginning with trial values @, = @, = @; = 1 and making five iter- 


ations similar to those made in the example of Section 3, we find, ac- 
curate to 3 decimal places, w, = .967, w. = .963, w; = 1.0000. Sub- 
stituting in (21), the “best” linear verbal score in the sense of equal- 
ized increments of variance is to two decimals, 


= Zy . , © %,. 
T=p (97574 + 98 ee tet )+ m 


which, on account of nearly equal correlations is not very different 
from that obtained, using »’s. 

If there are strong grounds for weighting the various subtests 
in such a way that the increments in variances will be different, that 
is, in proportion to given values, say L,, L2,--- L,, we would go through 
the same process as before except that the iteration equations (20) 
would be multiplied on the right by L;. For example, if it were con- 
sidered desirable to have subtest Sp, contribute an increment which 
would be twice as large as that of each other subtest, we would choose 
the L’s as 2,1, 1--- 1. 

In general, the Methods B and C will not give the same results 
and since the purpose of a test for ability in D is to rank individuals 
with respect to D with the greatest possible stability, Method A ap- 
pears to be the more fundamental. Method A is somewhat more logi- 
cal than the other two in certain extreme, and unusual, cases. For 
example, if the correlations of a subtest with each of the other sub- 
tests are all zero, the remaining correlations being positive, that sub- 
test would receive a zero y weight but a non-zero 6 and a non-zero w 
weight. It is reasonable that this subtest P should receive a zero 
weight, because if a subtest is completely independent of the remain- 
ing subtests it is worthless as far as its power to rank individuals 
with respect to D is concerned. If all subtests are independent, the y 
weights are indeterminate — one set of weights is as good as another 
— there being no stability in ranking of students by linear scores 
with respect to the aggregate of subtests. In the case of independence, 
method B would give a unique set of weights, and method C would 


i 
: 
‘ 
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give a unique set, except for sign, which would be meaningless as far 
as ranking power of the composite test is concerned. 

Again, if we have two groups of subtests which are independent 
of each other in the sense that the correlation is zero between each 
subtest in one group and every subtest in the other, then we may 
think of breaking the t-space up into two component or factor spaces, 
corresponding to the two groups of subtests, which are perpendicular 
to each other, each component space having its own ellipsoid family. 
Suppose one of these ellipsoid families, say E', has a relatively longer 
major axis than has the other ellipsoid family E.. This largest axis 
will then be the largest of the composite ellipsoid for the product of 
the two spaces, and its direction will be spedified by the direction of 
the longest axis of E, in its own component space, with the direction 
numbers all equal to zero for the other factor space. In other words, 
the y’s in (10) will all be zero for the ?¢’s belonging to that component 
space with the ellipsoid family F.. Thus, using y weights, the less 
highly intercorrelated group of subtests would be completely ignored, 
but not so in the case of 6 and w weights. This again is plausible 
point in favor of the y’s because the linear combination of indepen- 
dent groups of subtests makes little or no sense from the point of 
view of tests. Scores should be compared separately from the groups, 
and kept distinct. 

There is another important point which cannot be overlooked in 
connection with Methods B and C, especially if R’s or A’s are chosen 
proportional to an a priori set of wnequal numbers. There should be 
strong objective reasons for choosing such an a priori set of unequal 
numbers. The procedure of determining a set of weights so that the 
R’s (or A’s) will be proportional to a set of unequal numbers more 
or less arbitrarily named is likely to be a spurious refinement, even 
though there may be some “competent feeling” on the part of the per- 
son assigning the numbers that he has mentioned a reasonable set of 


numbers. 


6. Summary. 
In problems of constructing linear functions of positively corre- 


lated variables to serve as indices or scores, as for example, in test- 
ing, when no independent variable or ultimate criterion is available, 
the ordinary methods of multiple correlation and least squares are 
not applicable. The first problem considered in this paper is that. 
of the ultimate correlation of two different linear functions of n cor- 
related variables for large n. It is shown that under certain reason- 
able conditions the mean value of the correlation between the two 
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functions differs from unity, by terms of order 4 and the variance 


of the correlation is of order as showing that the stochastic ap- 


proach to unity is quite rapid. The larger the mean value of the cor- 
relation between pairs of variables, and the smaller the coefficient 
of variation of the weights (i.e., coefficients) in each linear function, 
the more rapidly the mean value of the correlation between the linear 
function approaches 1. 

For small values of 2 such as those encountered in problems of 
linearly combining subtest scores to obtain a final score, three meth- 
ods are considered for determining “best” sets of weights. Using the 
language of tests and assuming normality of the distribution of sub- 
test scores it is shown that if one chooses weights (Method A) pro- 
portional to the direction of the largest axis of ellipsoids of constant 
density, then these weights (y’s) are such that the generalized vari- 
ance of the subtest scores of all individuals receiving a given linear 
score with y weights is a minimum, thus maximizing the stability of 
the ordering of individuals. is 

A second method (Method B) is considered for determining 
weights by equalizing the correlations between each subtest and the 
total linear score. It is also shown how the weights can be deter- 
mined so that these correlations will be proportional to a given set 
of numbers. 

A third method (Method C) is discussed for determining 
weights which will equalize the increments of the variance of the to- 
tal score obtained by including each subtest with the remaining sub- 
tests. The method extends to the case in which these increments are 
made proportional to any given set of positive numbers. 

Iterative schemes are set up and illustrated for computing 
weights for Methods A and C. Weights for Method B can be found 
from the explicit solution of a set of linear equations. Method B is 
also illustrated by an example. 
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AN ABAC FOR DETERMINING THE MEAN DEVIATION OF 
A CLASS FROM THE GENERAL MEAN 


JACK W. DUNLAP 
University of Rochester and Salvatore Di Michael, New York City 


The abac is presented with instructions for use. 


In investigations dealing with abstract and complex traits such as 
courage, inhibition, honesty, morality, attitudes and in other studies 
which make use of qualitative estimates of success in form of letter 
grades, the common practice has been to classify, relatively, the in- 
dividuals under consideration. Often in such problems the experi- 
menter has found it convenient to express qualitative classifications 
in quantitative terms. This task is relatively simple if the form of 
the distribution is known or can be assumed. Generally the worker 
assumes a normal] distribution. If the assumption of normality is 
made, then knowing the percentage of individuals falling in a particu- 
lar group and the limits of that group, the specific problem is to 
determine in sigma units the average distance of the class from the 
mean of the total population. 

With the distribution known or assumed to be normal, the follow- 
ing formula gives the desired value: 


&1—Z2 
? 
P2o—Pi 





\ 
o 


where = = mean deviation of a portion of a unit normal distribution 


2, = height of the ordinate corresponding to p, 

2, = height of the ordinate corresponding to p. 

, = the lower limit of the class in terms of proportion 
2 = the upper limit of the class in terms of proportion. 


The labor and time expended in computing any number of oe 


values is considerable and can be substantially reduced by use of the 
accompanying abac. In constructing the abac it was assumed that a 
4.8 sigma range would be sufficient for most problems. 


_ 
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In using the abac, problems fall into three types: 
A) D1 > D2 > 50% 
B) Di > 50% > Pz 
C) 50% > Pi > De 
Problems of type (A) and (B) can be solved directly from the 
abac; problems of type (C) involve an intermediate step, namely, 
determining the complements of p, and p. and entering the abac with 


these values. In the case of type (C) the result is always negative. 


Examples: 
Type A: Likert, R., “A Technique for the Measurement of Atti- 


tudes,” Arch. Psychol., No. 140, presents the following data on page 
23: 


Statement Number 16 of the Internationalism Scale 














Strongly Dis- Strongly 

Alternative Approve Approve Undecided approve Disapprove 
Percentage 

Checking 13% 43% 21% 138% 10% 
Class 

Limits 0% 13% 56% 77% 90% 100% 
Sigma 

Value —1.63 —.42* 43 .99 1.75* 





First insert the class limits as above in row 3. Consider the class 
“Disapprove” in which falls 13% of the cases. The limits of this 
class are 90% and 77%. Let p. = 90% and p, = 77%. Then follow 
down the curved line for 90 until it intersects the vertical line for 
p, = 77. The sigma value is read off the left hand scale as .99. 

Type B: This is solved in precisely the same manner as Type A. 

Consider the class “Approve” whose limits are 138% and 56%. 
Now locate p. line for 55% and follow this until it intersects the p, 
line for 13%. The p. lines at this point are five units apart, so it is 
necessary to go up one fifth of the distance between p. = 55 and p. = 
60. Having interpolated for p, = 56 then read the corresponding 
sigma value from the left hand scale as —.42. 

Type C: When both limits are below 50%, it is necessary to 
determine the complements of the limits and enter the abac with these 
values, attaching the negative sign to the answer. For example, the 
class “Strongly Approve” has the limits of 0.0% and 138%. The com- 


*Likert reports values of —.43 and 1.76 which are slightly in error. 
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plements are 100% and 87%. Follow down the p, line for 100 until it 
intersects with the p, line, 87, and then read off the result from the 
left hand scale as 1.60. Affix the negative sign so that the result is 
—1.60. 

It is worth while to note that in cases where interpolation is 
necessary it may be more convenient to enter the abac with p, and 
follow this line up to the nearest p. value. With a little practice, the 
chart can be read accurately to two decimal places, with an occasional 
error in the second decimal place. 
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COMPARISON OF TWO FACTORIAL ANALYSES 


KARL J. HOLZINGER AND HARRY H. HARMAN 
University of Chicago 


A Bi-factor analysis is made of Professor Thurstone’s battery 
of fifty-seven tests employing his tetrachoric correlations. Although 
this analysis is made entirely independent of his multiple factor 
analysis, a very close agreement is found between the group factors 
— here and Thurstone’s verbal descriptions previously pub- 
lished. 


1. Introduction 


Professor Thurstone has recently described! some of his prelimi- 
nary analyses of a battery of fifty-seven tests given to 240 students. 
Before this description appeared, he was kind enough to furnish us 
with his tetrachoric correlations for independent analysis by the Bi- 
factor method. The striking agreement between our pattern and the 
verbal description of the factor allocations by Professor Thurstone 
makes a more complete numerical comparison interesting and signifi- 
cant. We therefore propose to present our analysis to be compared 
later with one or more of his factorizations. 

The correlations employed are of the tetrachoric form. Strictly 
speaking, the factorial algebra does not apply to such coefficients, 
since it has been worked out in terms of product-moment correlations. 
The tetrachoric values, however, may be regarded as rough approxi- 
mations to the product-moment values, and we shall proceed with the 
analysis as if they were such coefficients. The sampling error for the 
tetrachoric values is of course larger than for the Pearson coeffici- 
ents, and this will be allowed for in testing the final residuals. 

In Table I we have presented the complete set of intercorrela- 
tions. These have been given to two decimals which is adequate for 
the size of the sample and sampling error of the coefficients. All sub- 
sequent work will be carried to two places, the decimal point being 
omitted throughout to save space in the tables when there is no am- 
biguity. 


1L. L. Thurstone, “The Factorial Isolation of Primary Abilities,” Psycho- 
metrika, 1936, 1, No. 3, pp. 175-182. The present analysis was made immediately 
after the appearance of this article. The publication of our article was properly 
postponed until the appearance of Professor Thurstone’s numerical solution. 


— 
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2. Properties of the B-Coefficient 


Before proceeding further with the analysis we may define and 
note some characteristics of the B-coefficient.:_ Briefly, this coeffici- 
ent is the average of the intercorrelations of a certain group of tests 
divided by their average correlation with all remaining tests. It gives 
a measure of the extent to which this group of tests belong together 
in ascertaining an underlying factor. 

We shall define the B-coefficient more rigorously now, and in so 
doing shall use the following notation: 

k = number of tests in the argument of B; 

n = total number of tests; 

Roman subscripts run over the range 1, 2,---, k; 

Greek subscripts run over the range 1, 2, ---, ”; 

x; =7" test in the argument of B (not test number /) ; 

La =a" test in the total ordered group of tests (not test number 

a); 
a= 2'r,,,,=sum of intercorrelations of tests in B; 
C=2 12,2, — 22 72,2, = sum of remaining correlations of tests in 
ita i¢j 
argument of B with all other tests. 
The B-coefficient is then defined as 





a 2(n—k) J 12, c, 
B we. ij 
(Bay Bay ++» jy ** » Me) ~~ @ (K-L) [ ¥ iva —2 F 24H] z 
k(n—k) me ii 


Since the B-coefficient is the ratio of two averages its properties 
may be studied by means of them. The average of the intercorrela- 
tions tends to decrease as the number of tests in B increases since the 
tests are added on the basis of highest correlation with tests already 
in the argument of B. Similarly, the average of the remaining corre- 
lations tends to decrease with an increase in k. The decrease in the 
average of intercorrelations, however, is relatively greater than that 
of the remaining correlations, and hence the B-coefficient decreases 
in general. 

An exception to this may occur with the addition of a test to the 
argument of B which has relatively high intercorrelations with the 


1First introduced in Preliminary Report on Spearman-Holzinger Unitary 
Trait Study, No. 7. Prepared at the Statistical Laboratory, Department of Edu- 


cation, University of Chicago, 1936. 
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, TABLE II 
ALLOCATION OF TESTS INTO GROUPS =—- 
B (My, Lay +05 jy eeey Ly) 100B | Notes 
B(4,5) 232 
B(4,5,60) 206 
B(4,5,60,58) 219 
B(4,5,60,58,11) 211 
B (4,5,60,58,11,10) 201 
B(4,5,60,58,11,10,16) 197 
B(4,5,60,58,11,10,16,52) 194 
B(4,5,60,58,11,10,16,52,57) 188 
B(4,5,60,58,11,10,16,52,57,7) 184 
B (4,5,60,58,11,10,16,52,57,7,6) 179 
B (4,5,60,58,11,10,16,52,57,7,6,26) 172 (1) 
B(4,5,60,58,11,10,16,52,57,7,6,45) 171 (2) 
B(4,5,60,58,11,10,16,52,57,7,6,56) 180 
B(4,5,60,58,11,10,16,52,57,7,6,56,55) 178 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,59) 168 (1) 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14) 175 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,54) 170 (2) 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,45) 169 (2) 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,12) 171 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,13) 168 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,13,15) 165 
B (4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,13,15,9) 166 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,13,15,9,26) 161 (1) 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,13,15,9,54) 163 
B (4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,13,15,9,54,45) 159 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,18,15,9,54,45,26) 155 (3) 
B(4,5,60,58,11,10,16,52,57,7,6,56,55,14,12,13,15,9,54,45,59) 154 (3) 
B (40,42) 222 
B(40,42,37) 167 (4) 
B(40,42,41) 152 (4) 
B (21,24) 194 
B(21,24,19) 189 
B(21,24,19,28) 170 (2) 
B(21,24,19,22) 186 
B (21,24,19,22,18) 175 (2) 
B(21,24,19,22,23) 182 
B(21,24,19,22,23,20) 193 
B(21,24,19,22,23,20,18) 190 
B(21,24,19,22,23,20,18,17) 191 
B (21,24,19,22,23,20,18,17,8) 188 
B (21,24,19,22,23,20,18,17,8,53) 186 
B.(21,24,19,22,23,20,18,17,8,53,27) 170 
B (21,24,19,22,23,20,18,17,8,53,27,28) 186 
B(21,24,19,22,23,20,18,17,8,53,27,28,25) 181 
B (21,24,19,22,23,20,18,17,8,53,27,28,25,43) 177 (1) 
B (21,24,19,22,23,20,18,17,8,53,27,28,25,29) | 178 (5) 
B(21,24,19,22,23,20,18,17,8,53,27,28,25,29,43) | 174 (6) 
B (41,48) | 164 
B(41,43,44) | 164 
B (41,43,44,30) | 155 (7) 
B(41,43,44,39) | 150 (7) 
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TABLE II (continued) 
ALLOCATION OF TESTS INTO GROUPS 

B (a4) Lay +++ y Ljye0%y Ly) 100B | Notes 
B.(37,39) 196 
B(87,39,30) 159 (2) 
B (87,39,35) 189 
B(87,39,35,30) 167 (2) 
B(87,39,35,38) 179 

B (87,39,35,38,34) 181 

B (87,39,35,38,34,30) 170 (2) 
B(87,39,35,38,34,33) 183 

B (87,39,35,38,34,33,32) 180 
B.(87,89,35,38,34,33,32,31) 186 

B (87,39,35,38,34,33,32,31,30) 186 
B(87,39,35,38,34,33,32,31,30,36) 165 (8) 
B(47,49) 173 
B(47,49,46) 169 
B(AT,49,46,48) 159 (2) 
B(47,49,46,50) 165 

B (47,49,46,50,51) 127 (1) 
B(47,49,46,50,48) 161 
B(47,49,46,50,48,51) 140 (9) 

NOTES ON TABLE II 

(1) Rejected because of large drop in B. 

(2) Test omitted temporarily; it reappears in group later. 

(3) Tests 26 and 59 cause a sufficient drop in B for their rejection from this 
group. Furthermore these tests are not of the same general character as 
those in the “verbal” group, namely, tests 4, 5, 6, 7, 9, 10, 11, 12, 18, 14, 15, 
16, 45, 52, 54, 55, 56, 57, 58, and 60. 

(4) “Logical reasoning’ group composed of doublet, 40 and 42. Tests 37 and 
41 rejected because of great difference in B. 

(5) Test 29 retained because of its spatial character which is in harmony with 
the remaining tests in the group. 

(6) Test 43 rejected because of drop in B and its composition. The “spatial” 
group consists of tests 8, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29 and 53. 

(7) Tests 30 and 39 omitted because of drop in B and their numerical character. 
The “analogies” group consists of tests 41, 48, 44. 

(8) “Arithmetical” group composed of tests 30, 31, 32, 38, 34, 35, 37, 38 and 
39. Test 36 rejected because of wide difference in B. 

(9) Although test 51 seems to be of the same general nature as the other tests 


in the group, the sudden drop in B does not warrant its retention within 
the group. Hence, the “memory” group consists of tests 46, 47, 48, 49 


and 50. 


preceding tests, but a low total of all correlations. In this case the 
decrease in the average of the intercorrelations is relatively smaller 
than that of the remaining correlations, and B increases.1 Similar 


1A good example of this phenomenon is found in Section 3 where the addi- 
tion of test 20 to the “spatial” group increases B. From Table II it will be ob- 
served that the B-coefficient rises from 182 to 198 upon the addition of test 20. 
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reasoning accounts for the fact that a test can be rejected from a 
group temporarily and then appear in the group later.’ 

As the number of tests in B increases, the decrease in the above 
averages becomes less and these averages tend toward stability. A 
consequence of this is that an actual difference between two succes- 
sive B values has a greater relative importance as the number of tests 
in B increases. 

3. The Preliminary Allocation of Tests to Groups 

The Bi-factor analysis is begun by computing the B-coefficients. 
In this analysis we have used 100B to avoid decimals. The values of 
these coefficients with notes are presented in Table II. 

We begin the computation of B-coefficients by selecting the larg- 
est correlation from Table I. This yields 100B(4, 5) — 232. Next 
test 60 is selected because it has a higher correlation with test 4 or 5 
than any other in the table. The work is continued in this manner 
until test 26 is added. A drop of seven points in the coefficient is con- 
sidered sufficient reason for dropping this test, and similarly in the 
case of test 45, although the latter reappears in the group near the 
end. The group is closed with the rejection of tests 26 and 59 as ex- 
plained in the note because of the drop in B and the nature of these 
two tests. The first group is tentatively described as “verbal”. 

A new group is now formed using tests 40 and 42. When other 
tests are added to this group the drop in B is so great that we regard 
the “logical reasoning” factor as a “doublet” and proceed to another 
group. This is begun with tests 21 and 24 and continued as described 
in the notes until test 43 is rejected. The group appears to be “spa- 
tial’. 

The next group starts with tests 41 and 43, but other tests be- 
yond 44 are rejected because of their nature and the drop in B val- 
ues. The name “analogies” has been temporarily used here. The re- 
maining two groups are identified as shown by the table and notes. 
They may be called “arithmetical” and “memory” respectively. 

From the preliminary analysis of the B-coefficients and refer- 
ence to the nature of the tests themselves, all tests have been allocated 
to one of six groups with the exception of tests 26, 36, 51 and 59. 


4, The Modified Bi-Factor Pattern 
After the preliminary groups of tests have been determined, the 
next step in the analysis is the calculation of the weights for the gen- 


2The rejection of test 28 as the fourth test in the “spatial” group and its 
retention later as the twelfth test is an example to be found in Section 38. 
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TABLE III 


General Factor Loadings by Preliminary Analysis 























Test U, Test Uy Test U, 
4 54 23 52 42 64 
5 65 24 58 43 87 
6 | 81 25 53 44 17 
it ae 26 44 45 77 
eae 27 38 46 36 
9 25 28 59 . AT 50 

10 60 29 59 48 43 

11 64 30 69 49 46 

12 57 31 31 50 3 

13 48 32 38 51 33 

14 66 33 35 52 53 

15 39 34 46 53 32 

16 58 35 57 54 34 

17 41 36 34 55 64 

18 51 37 64 56 44 

19 54 38 49 57 61 

20 36 39 69 58 36 

21 67 40 | 68 59 26 

22 #=| #64 a 60 70 








eral factor u,. This we have done as described in Report 7, and their 
values are given in Table III. We then computed the residual correla- 
tions, 


Tee; = Ve,e; eT ei Ye jus 


In order to save space the table of these residuals has been omitted 
here, and instead, portions of this table will be presented when neces- 
sary. An examination of the residual correlations shows the necessity 
for modifying the original Bi-factor Pattern. 

Most of the groups appear to have been properly selected because 
of the small residuals with other tests and larger clusters among them- 
selves. The residuals for the “verbal” group, however, indicate that 
a re-allocation of tests is necessary. In Table IV we present the resi- 
dual intercorrelations among the “verbal” group. 

First, tests 14 and 45 have negligible residuals with the “verbal’’ 
group and are therefore omitted. Next we note that tests 54 and 55 
have a high residual intercorrelation, but small irregular correlations 
with the remaining tests in the “verbal” group. Hence, we assume 
that the “doublet” is measuring some other factor such as “rhythm”. 
Finally we observe that tests 9, 10, 11, 12, 13 and 15 have high inter- 
correlations. Tests 9, 10 and 11 also have appreciably high intercor- 
relations with the remaining tests in the “verbal” group, while tests 
12, 13 and 15 have low intercorrelations with this group. We assume, 
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therefore, that tests 9, 10, 11, 12, 18 and 15 form another group, say 
“completion”, and that tests 9, 10 and 11 measure the “verbal” factor 
also. 

The second factor plan as far as the “verbal” group is concerned 
may then be written in the form as shown in Table V. The crosses 
indicate appreciable factor weightings. 

The only other necessary revision in the factor pattern arises in 
the introduction of a new group factor which we may call “imagina- 
tion”. The residual correlations among tests 6, 14, 26, 51 and 59 have 
been reproduced in the small Table VI. These values are all positive 
and are relatively high as compared with their correlations with the 
remaining tests. 

None of these tests except 6 has been allocated at this stage to 
another group by the B-coefficients. We observe again the residual 
correlations of test 6 with the tests in the “verbal” group and find 
that they are negligible and so consider test 6 to measure u, and 
“imagination”. 

The final factor plan thus includes seven group factors which we 
shall designate as follows: 


v = “verbal”, 

i= “imagination”, 

$= “spatial’’, 

c= “completion”, 
m= “arithmetical”, 

a= “analogies”, 

0 = “memory”. 


In addition to these groups we have two “doublets”: 


l = “logical reasoning’, 
r = “rhythm”’. 


Under the new hypothesis every test has been assigned to some 
group except tests 36 and 45. The assumption on these tests is that 
they measure only wu, and specifics. 

From this new allocation of tests we proceed to recalculate the 
weights of the general factor u,. The values are given in the final fac- 
tor plan of Table VII. It will be observed that the values from the 
first column of this table are in close agreement with those of Table 
III. 

Residuals with the general factor removed are given in Table 
VIII. The tests have been arranged so that the groups may be iden- 
tified more conveniently. 








60 
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TABLE IV 
Residual Intercorrelations of Preliminary “Verbal” Group 
4 6 6 7 9 10 1 12 #18 14 15 16 45 6562 54 55 56 57 58 
48 
10 10 
15 20 25 
o..33. 47. 21 
15 22 03 19 34 
28 22 08 22 27 & 
04 05 —17—05 02 17 «21 
01 08 —08 12 25 28 15 21 
01—01 11 #17 #11 «+20 18 —01 —04 
—07 03—08—03 27 14 10 24 37 —04 
15 16 O1 12 «22 87 «6886 62618 «(08 COS5 
10 01 03—01 09 —02—10 —09 06 —20 —11 —05 
22 28 08 12 30 87 12—05 06 02—05 18 —03 
17 09 #O8 —12 12 15 O01 06 05 O09 21 10 OF 16 
06 04 09 19 09 11—01 02 06 09 06 14 05 09 84 
16 13 #12 #17 «+320 «81 «15 «621 «619 «6005 «620 620-01 42288 «15-18 
09 #%18—01 08 18 82 10 11 19 05 29 26—04 14 16 22 36 
38 52 09 28 41 44 58 14 14 06 06 25 05 87 26 15 44 14 
84 30 10 28 OF #19 82 22 14-08 18 27—11 20 02 14 81 19 = 57 
TABLE V 
New Hypothesis on Original “Verbal” Group 
Tests uw, iv c r 
4 x x 
5 x x 
6 x x 
“f x x 
9 x x x 
10 4 x x 
11 x x x 
12 x x 
13 x x 
14 x 
15 x x 
16 x x 
45 x 
52 x x 
54 x x 
55 x x 
56 = x 
57 x x 
58 x x 
60 x x 
TABLE VI 


Residual Intercorrelations Among Tests 
in “Imagination” Group 





6 14 25 51 59 
6 
14 11 
26 35 11 
51 02 26 13 
59 17 14 11 37 
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TABLE VII 
FACTOR PATTERN 
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Factors with Coefficients 
















































































Test | ] } Speci-| Total 
uy v 4 8 c m 1 ae || | r fics | Variance 
4 .56 AT | | -68 1.00 
5 66 54 | 52 1.00 
6 82 35 | 45 1.00 
q .65 .28 | | 71 1.01 
8 45 42 | -79 1.00 
9 227 50 2 -79 1.00 
10 62 54 .28 .50 1.00 
11 64 51 22 | 53 1.00 
12 .56 41 12 1.00 
13 48 .68 .55 1.00 
14 .66 35 .66 .99 
15 | .39 .60 -70 1.00 
16 .59 .38 71 1.00 
17 .40 .56 -73 1.01 
18 51 .58 .64 1.01 
19 .54 AT .70 1.00 
20 | .36 -72 | 59 1.00 
21 67 | | 55 | -50 1.00 
22 | 53 | 54 | 65 1.00 
23 52 .48 1 1.00 
24 .57 .50 | .65 1.00 
25 | .52 31 .80 1.00 
26 | Al 48 | .80 .99 
27 .38 .52 | .76 .99 
2 .58 .36 | .78 1.00 
29 .58 .27 AT 1.00 
30 .68 Al | 61 1.00 
31 | .30 62 72 .99 
32 | .39 54 | 75 1.01 
33 35 74 .57 1.00 
34 46 .64 .62 1.01 
35 | 57 42 | 71 1.01 
36 .34 .94 1.00 
37 63 27 .73 1.00 
38 .48 43 -76 .99 
39 .68 42 .60 1.00 
40 .68 .58 45 1.00 
41 | 81 aa) .56 1.00 
42 64 .58 50 1.00 
43 .86 ay | .50 1.00 
44 .17 {8 | 58 1.00 
45 15 | | | .66 1.00 
46 .36 | | 8 .80 1.00 
47 abi - | | 57 64 -99 
48 42 | | | .86 .83 .99 
49 47 | | | | 133 ‘82 1.00 
50 36 | | | | 46 81 1.00 
51 30 | | 48 | 82 .99 
52 54 | 42 | 78 1.00 
53 .82 | 45 .83 .99 
54 40 | .53 .75 1.00 
55 .70 53 .48 1.00 
56 46 49 | 74 1.00 
57 .63 80 | | .72 1.01 
58 .89 76 | | 52 1.00 
59 28 | | 49 | | 84 1.00 
60 1 | 62 | | | .33 1.00 

Total | | | 
Variance | 17.18 | 3.01 | 90 | 3.41 | 4a7 | 2.41 67 | .12 | 1.01 | .56 | 26.65 56.99 

| ! \ 

| 

Per Cent | | | | | 

Variance 30.15 | 5.28 | 1.58 | 5.98 {2.05 | 4.28 | 1.18 | .21 | 1.77 | .98 | 46.59 100.00 
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TABLE VIII (continued) 
FINAL RESIDUAL CORRELATIONS 
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Below diagonal 
Above diagonal 
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The weights of the group factors are determined by the use of 
Professor Spearman’s 1914 formula, and the factors then removed 
in any order by means of the formula 


Nowe, =e, y— Vat Tey 


where z is any one of the factors. Residuals with the group factors 
eliminated have been printed above the main diagonal opposite the 
corresponding residuals with only u, removed in Table VIII. 

In order to test the goodness of fit of the modified pattern to the 
whole set of correlations, a frequency distribution of the final resi- 
duals has been made as shown in Table IX. The standard deviation 
of these is .098 and .67450 = .066. The probable error of a zero tetra- 
choric correlation is .072. These two values agree to two decimal 
places and hence the factor pattern may be regarded as a satisfactory 
fit. 


5. Comparison of the Bi-Factor Pattern 
with a Multiple Factor Analysis 


From Professor Thurstone’s preliminary analysis cited above, 
we may make a comparison of the corresponding factor loadings by 
the two methods. In Table X the factors have been arranged in the 
order of significance as stated by Thurstone; a single cross indicating 
what he calls an “appreciable” loading and a double cross designating 
a “high” factor loading. We have also included the names and sym- 
bols employed in both analyses. 

In the case of the “number” or “arithmetical” factor the agree- 
ment is perfect and almost so in the case of the “spatial” factor. The 
“memory” factor also reveals remarkable agreement. When we come 
to the “verbal” factors, the agreement although not perfect is re- 
markably close. 

We do not find such perfect results on comparing the less promi- 
nent factors. Our “imagination” factor is quite comparable to the 
“perceptual speed” of Professor Thurstone’s analysis. The “induc- 
tion” factor has no counterpart in the Bi-factor analysis while the 
“analogies” and “rhythm” factors are not represented in the Multiple 
Factor analysis. Finally, the “deduction” factor, although minor in 
significance, agrees perfectly in its conspicuous loadings with our 
“logical reasoning.” 

A formal difference in the two analyses occurs in the case of the 


1See Preliminary Report on Spearman-Holzinger Unitary Trait Study, No. 
2, equation (6). 
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TABLE IX 
FREQUENCY DISTRIBUTIONS OF FINAL RESIDUAL CORRELATIONS 
Value of Residual Frequency 
.825— .3845 1 
305— .825 1 
.285— 805 - 
.265— = .285 3 
.245— .265 6 
225— 245 17 
.205— .225 10 
185—  .205 22 
165— .185 25 Mean = .004 
145— .165 84 
125— .145 54 Standard Deviation= .098 
105— .125 69 
.085— .105 84 .6745 X S.D. = .066 
.065— .085 88 
045 — .065 99 Probable Error of 
.025— .045 123 Zero Correlation = _ .072 
.005— .025 155 
—.015— .005 133 Q; = .068 
—.035 — —.015 124 
—.055 — —.035 124 Q, = —.060 
—.075 — —.055 107 , ras 
—.095 — —-.075 82 Quartile Deviation = .064 
—.115 — —.095 63 
—.185 — —.115 50 
—.155 — —.135 41 
—.175 — —.155 29 
—.195 — —.175 19 
—.215 — —.195 11 
—.285 — —.215 12 
—.255 — —.235 5 
—.275 — —.255 3 
—.295 — —.275 1 
—.315 — —.295 = 
—.3835 — —.315 Zz 
—.355 — —.3835 1 
Total 1596 











general factor which we obtain and which Professor Thurstone ap- 
parently does not. The presence of this factor in our pattern is due 
to our hypothesis of its existence and the essentially positive correla- 
tions throughout, which afford a basis for the evaluation of u,. It can 
be shown that each of the group factors in the multiple factor analy- 
sis can be expressed as a linear function of the corresponding group 
factor and the general factor of the Bi-factor analysis. We have 
shown elsewhere’ how to obtain the exact mathematical relationships 
between the factors of various multiple factor solutions and those of 


1Holzinger, K. J., and Harman, H. H., “Relationship between Factors ob- 
tained from Certain Analyses,” The Journal of Educational Psychology, May, 
1937, pp. 321-346. 
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PSYCHOMETRIKA 


TABLE X 
COMPARISON OF FACTOR LOADINGS 
xx — high factor loading 
x — appreciable loading 
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the Bi-factor solution. 


We plan to show these algebraic relationships 
between the factors of the present study as soon as the numerical so- 
lution of the multiple factor analysis is available. 











